Amazon SQS
A fully managed message queue service that enables asynchronous communication between distributed systems, decoupling senders and receivers to improve overall system resilience
Overview
Amazon SQS (Simple Queue Service) is one of the earliest services AWS offered, with over 20 years of history since its 2004 beta release. It is a fully managed message queue that implements the asynchronous messaging pattern where producers send messages to a queue and consumers retrieve and process them. Standard queues provide virtually unlimited throughput per second, while FIFO queues deliver up to 3,000 messages per second (with batching) with ordering guarantees. The maximum message size is 256 KB; for larger payloads, the Extended Client Library stores the payload in S3. Messages can be retained for up to 14 days.
Design Decisions - Standard vs. FIFO Queues
SQS offers two queue types, and the choice depends on your system requirements. Standard queues guarantee at-least-once delivery but do not guarantee message ordering. Because messages are internally replicated across multiple servers, the same message may occasionally be delivered twice. You need to design consumers to be idempotent. FIFO queues provide exactly-once processing and first-in-first-out ordering. Using message group IDs, you can guarantee ordering within the same group while processing messages from different groups in parallel. FIFO queue throughput is lower than standard queues, so choose standard queues for workloads that don't require ordering guarantees.
Visibility Timeout and Dead-Letter Queues
The SQS visibility timeout is the period during which a message becomes invisible to other consumers after one consumer retrieves it. The default is 30 seconds, configurable up to 12 hours. If the consumer does not finish processing and delete the message within the visibility timeout, the message becomes visible again for another consumer to pick up. Messages that repeatedly fail processing (poison messages) can be automatically moved to a dead-letter queue (DLQ). By setting maxReceiveCount, messages received more than the specified number of times are moved to the DLQ. Analyzing DLQ messages helps identify the root cause of processing failures, and you can redrive messages back to the original queue after fixing the issue. Azure Service Bus provides similar queue and topic-based messaging, though SQS is notable for its simpler API design and lower learning curve.
Practical Use Cases
The most common SQS use case is offloading backend processing in web applications. User requests are immediately placed in a queue and a response is returned, while backend workers retrieve messages from the queue and perform time-consuming tasks (image conversion, email sending, report generation, etc.). Combined with Lambda's event source mapping, Lambda functions are automatically triggered whenever messages arrive in the queue, enabling you to build a serverless asynchronous processing pipeline. For a systematic study of SQS from basics to advanced topics, books on Amazon are a great resource.