AWS EventBridge vs SNS vs SQS Explained
In the era of distributed systems and microservices, the "glue" that binds services together is often more critical than the services themselves. As a cloud architect, the most frequent question I encounter during design reviews is: "Should we use EventBridge, SNS, or SQS for this integration?" While all three services facilitate asynchronous communication, they serve distinct architectural purposes. Choosing the wrong one doesn't just lead to technical debt; it can result in significant cost overruns, performance bottlenecks, and a system that is difficult to observe and maintain.
Understanding these services requires moving beyond the basic definitions of "queues" and "topics." We must look at them through the lens of message delivery patterns: point-to-point, fan-out, and event-driven choreography. SQS provides the durability and flow control needed for heavy processing; SNS offers the high-throughput, low-latency broadcast capabilities required for immediate notifications; and EventBridge acts as a sophisticated central nervous system, capable of routing events based on complex patterns and integrating seamlessly with third-party SaaS providers.
In a production-grade environment, you rarely use just one. Modern architectures often chain these services—for example, using EventBridge to route a specific event to an SNS topic, which then fans out to multiple SQS queues. This "fan-out" pattern ensures that each consuming service has its own dedicated queue, providing isolation and allowing for independent scaling and failure handling.
Architecture and Core Concepts
The fundamental difference lies in how messages are consumed and routed. SQS is a pull-based service where consumers poll the queue. SNS and EventBridge are push-based, meaning they deliver messages to targets as they arrive.
SQS: The Buffer
SQS acts as a shock absorber. It is designed for point-to-point communication where a producer sends a message that exactly one consumer processes. It is essential when the consumer's processing rate fluctuates or when you need to guarantee that a message is eventually processed via retries and Dead Letter Queues (DLQs).
SNS: The Broadcaster
SNS is built for high-throughput fan-out. When an event occurs that multiple systems need to know about simultaneously (e.g., a "UserSignedUp" event), SNS pushes that message to all subscribed endpoints. It supports protocols like HTTP/S, Email, SMS, and SQS.
EventBridge: The Intelligent Router
EventBridge is the evolution of CloudWatch Events. It excels at complex routing logic. Unlike SNS, which delivers the entire payload to all subscribers (unless basic filtering is used), EventBridge can inspect the message body, transform it, and route it to over 20 different AWS targets based on specific JSON patterns.
Implementation: Cross-Service Integration
In a production scenario, you might use the AWS SDK for Python (boto3) to emit events to EventBridge, which then handles the downstream complexity. Below is an example of a service emitting a structured event.
import boto3
import json
import uuid
def publish_order_event(order_id, status, total_amount):
client = boto3.client('events')
event_detail = {
"order_id": order_id,
"status": status,
"amount": total_amount,
"trace_id": str(uuid.uuid4())
}
response = client.put_events(
Entries=[
{
'Source': 'com.mycompany.orders',
'DetailType': 'OrderCreated',
'Detail': json.dumps(event_detail),
'EventBusName': 'default'
}
]
)
return response
# Example usage in an Order Service
# This event could be routed to an SQS queue for warehouse processing
# and an SNS topic for customer SMS notifications via EventBridge rules.
publish_order_event("ORD-12345", "CREATED", 99.99)For SQS, the implementation focuses on safe consumption and visibility timeouts:
import boto3
sqs = boto3.client('sqs')
queue_url = 'https://sqs.us-east-1.amazonaws.com/123456789012/InventoryQueue'
def consume_messages():
# Long polling to reduce cost and latency
messages = sqs.receive_message(
QueueUrl=queue_url,
MaxNumberOfMessages=10,
WaitTimeSeconds=20
)
if 'Messages' in messages:
for msg in messages['Messages']:
# Process logic here...
print(f"Processing: {msg['Body']}")
# Delete message after successful processing
sqs.delete_message(
QueueUrl=queue_url,
ReceiptHandle=msg['ReceiptHandle']
)
consume_messages()Best Practices Table
| Feature | SQS | SNS | EventBridge |
|---|---|---|---|
| Primary Pattern | Queuing (Point-to-Point) | Pub/Sub (Fan-out) | Event Bus (Choreography) |
| Delivery Model | Pull (Polling) | Push | Push |
| Max Payload | 256 KB | 256 KB | 256 KB |
| Message Filtering | No (Consumer handles logic) | Yes (Attribute-based) | Yes (Content-based/JSON) |
| Schema Registry | No | No | Yes (Discovery/Versioning) |
| Ordering | FIFO Queues supported | FIFO Topics supported | Limited (via Global Endpoints) |
| Persistence | Up to 14 days | No (Retry policy only) | No (Archive/Replay feature) |
Performance and Cost Optimization
Cost optimization in messaging requires understanding how you are billed. SQS and SNS are billed per request (with SQS batching being a huge saver). EventBridge is billed per event published to the bus, but rules and targets are free up to a point.
To optimize performance:
- SQS Long Polling: Always set
WaitTimeSecondsto 20. This reduces the number of empty responses, lowering your bill and reducing CPU spikes on consumers. - SNS Filtering: Use
FilterPoliciesto ensure consumers only receive relevant messages. This prevents unnecessary Lambda invocations or SQS storage costs. - EventBridge Schema Discovery: Enable this in development to automatically generate code bindings, reducing serialization errors that cause costly retries.
Monitoring and Production Patterns
In production, visibility is everything. You must monitor the "Age of oldest message" for SQS and "NumberOfNotificationsFailed" for SNS. EventBridge provides "TriggeredRules" and "FailedInvocations" metrics.
The DLQ Pattern
Every SQS queue and EventBridge target should have a Dead Letter Queue. For EventBridge, this is crucial because it is a push-based service; if the target (like a Lambda function) is throttled or down, the event could be lost after the standard 24-hour retry window. Attaching a DLQ to the rule target ensures you can replay failed events once the downstream system recovers.
Event Archival and Replay
One of EventBridge's most powerful production features is the ability to archive events. Unlike SNS or SQS, where messages are gone once consumed or expired, EventBridge can store events in an archive for a specified duration. This allows you to "replay" events from a specific time window—an invaluable tool for debugging or recovering from a production bug that caused data corruption.
Conclusion
The choice between EventBridge, SNS, and SQS is not about which service is "better," but which one fits the communication pattern of your microservices. SQS is your workhorse for reliable, asynchronous processing. SNS is your high-speed broadcaster for simple fan-out. EventBridge is your sophisticated orchestrator, perfect for complex routing, SaaS integration, and building a truly event-driven enterprise. By combining these services—using EventBridge for routing, SNS for wide distribution, and SQS for resilient consumption—you build a cloud-native architecture that is both scalable and maintainable.