The Queue | techdebt.blog

mode:

click Step or Play to inspect queue state

the problem

Service A calls Service B synchronously. If B is slow, A is slow. If B is down, A fails. If A produces work faster than B can consume it, requests pile up and both services degrade.

Synchronous communication creates tight coupling. Every service in the chain must be available and fast for the whole system to work. One slow service slows everything downstream.

A message queue breaks this coupling. A produces messages into the queue and moves on. B consumes messages from the queue at its own pace. If B is slow, messages accumulate in the queue instead of backing up into A. If B is down, messages wait in the queue until B recovers. A and B never need to be available at the same time.

point-to-point

The simplest pattern. One queue, one or more producers, one or more consumers. Each message is delivered to exactly one consumer. When multiple consumers are connected, messages are distributed among them (like a load balancer for work items).

Producer --> [Queue: M1, M2, M3] --> Consumer 1
                                --> Consumer 2
M1 -> Consumer 1
M2 -> Consumer 2
M3 -> Consumer 1

Each message is processed once. If Consumer 1 fails to process M1, the message returns to the queue and is delivered to another consumer (or retried). After too many failures, the message goes to a dead-letter queue for investigation.

Point-to-point is ideal for work distribution: sending emails, processing payments, resizing images. The queue acts as a buffer between producers and workers. Add more workers to increase throughput.

pub-sub

Publish-subscribe inverts the model. Instead of one consumer getting each message, every subscriber gets every message. A producer publishes to a topic, and all subscribers receive a copy.

Producer --> [Topic: M1] --> Subscriber A (gets M1)
                         --> Subscriber B (gets M1)
                         --> Subscriber C (gets M1)

Pub-sub is ideal for event broadcasting: a user signs up, and the welcome email service, the analytics service, and the notification service all need to know. The producer publishes one event; each subscriber processes it independently.

The producer does not know (or care) how many subscribers exist. New subscribers can be added without modifying the producer. This is the decoupling that makes event-driven architectures work.

consumer groups

Consumer groups combine the best of point-to-point and pub-sub. A topic is divided into partitions. Each partition is consumed by exactly one member of the consumer group. Different consumer groups independently consume the same topic.

Topic with 3 partitions:
  P0 --> Consumer Group A: Consumer 1
  P1 --> Consumer Group A: Consumer 2
  P2 --> Consumer Group A: Consumer 3

  P0 --> Consumer Group B: Consumer 4
  P1 --> Consumer Group B: Consumer 4
  P2 --> Consumer Group B: Consumer 4

Within Group A, each partition goes to one consumer (parallel processing). Group B independently consumes all partitions (fan-out). This is the Kafka model: consumer groups enable both parallel consumption and independent subscriptions.

Partitions are the unit of parallelism. More partitions allow more consumers in a group. If you have 3 partitions, at most 3 consumers in a group can be active (a 4th consumer would be idle). Plan partition count based on expected peak parallelism.

delivery guarantees

Three levels of delivery guarantee, each with different tradeoffs:

At-most-once: Fire and forget. The producer sends the message and does not wait for acknowledgment. Fast, but messages can be lost if the broker crashes before persisting.

At-least-once: The producer retries until the broker acknowledges. Messages are never lost, but duplicates are possible (the broker received the message but the ack was lost, so the producer retries). This is the most common guarantee.

Exactly-once: Each message is processed exactly once. The holy grail, but extremely expensive to implement. Requires idempotent producers, transactional consumers, and coordination between the queue and the processing logic. Kafka supports it within a single Kafka cluster. Across systems, true exactly-once is effectively impossible without idempotency on the consumer side.

In practice, design for at-least-once delivery and make consumers idempotent. An idempotent consumer produces the same result whether it processes a message once or twice. This is simpler and more robust than trying to guarantee exactly-once delivery.

where it shows up

Apache Kafka: Distributed log with partitioned topics, consumer groups, and strong ordering within partitions. The standard for high-throughput event streaming.
RabbitMQ: Traditional message broker with exchanges, queues, and routing. Supports AMQP protocol. Good for complex routing patterns and task queues.
Amazon SQS: Fully managed queue service. Standard queues (at-least-once, best-effort ordering) and FIFO queues (exactly-once, strict ordering). Scales automatically.
Apache Pulsar: Similar to Kafka but with built-in multi-tenancy, geo-replication, and tiered storage. Separates compute (brokers) from storage (BookKeeper).
Redis Streams: Lightweight, fast message stream built into Redis. Good for small-to-medium throughput where you already have Redis deployed.

mode:

click Step or Play to inspect queue state

+ exactly-once delivery: the impossible guarantee

Exactly-once delivery across distributed systems is essentially impossible without application-level help. The fundamental problem: the Two Generals’ Problem. The producer sends a message. The broker receives and persists it. The broker sends an ack. If the ack is lost, the producer does not know if the message was persisted. It must retry, creating a duplicate.

Kafka’s “exactly-once semantics” (EOS) works within a single Kafka cluster by combining idempotent producers (deduplication using sequence numbers) with transactional writes (atomic writes across multiple partitions). This eliminates duplicates within Kafka but does not extend to external systems.

The practical solution: design consumers to be idempotent. Use a deduplication key (message ID, event ID, or a natural business key). Before processing, check if the key has been seen. If yes, skip. If no, process and record the key.

For database operations, use upserts (INSERT … ON CONFLICT DO UPDATE) or conditional writes (UPDATE … WHERE version = expected_version). These are naturally idempotent. For external side effects (sending an email, charging a credit card), use an outbox pattern: write the intent to a database table, and a separate process reads and executes it exactly once.

+ ordering guarantees and partitioning

Global ordering (every consumer sees every message in the exact order it was produced) is expensive. It requires a single partition, which limits throughput to a single consumer.

Kafka’s compromise: ordering within a partition, no ordering across partitions. Messages with the same partition key always go to the same partition and are processed in order. Messages with different keys may be processed out of order.

This is sufficient for most use cases. Events for the same entity (same user, same order, same account) need ordering. Events for different entities do not. Use the entity ID as the partition key, and you get per-entity ordering with full parallelism across entities.

The pitfall: hot partitions. If one entity produces far more events than others (a celebrity’s social media account, a high-volume merchant), its partition becomes a bottleneck. The consumer assigned to that partition falls behind while other consumers are idle. Solutions include sub-partitioning hot keys or using consistent hashing with virtual nodes.

+ dead-letter queues and poison pills

Some messages cannot be processed. The format is wrong, the referenced entity does not exist, or the processing logic has a bug. Without special handling, these “poison pill” messages block the queue: the consumer tries, fails, retries, fails again, and the message sits at the head of the queue forever.

A dead-letter queue (DLQ) is a separate queue where failed messages are sent after a configurable number of retries. The main queue keeps moving, and an operator or automated process investigates the DLQ.

Best practices for DLQs:

Set a reasonable retry count (3-5 attempts) before dead-lettering.
Include the original message, the error, and the retry history in the DLQ entry.
Monitor DLQ depth. A growing DLQ indicates a systemic problem, not just bad messages.
Build tooling to replay DLQ messages back into the main queue after fixing the issue.

SQS, RabbitMQ, and Kafka (via error topics) all support dead-letter queues. Configure them from day one. You will need them.

+ backpressure in message systems

When producers outpace consumers, the queue grows. If the queue is unbounded, it eventually exhausts memory or disk. If it is bounded, what happens when the queue is full?

Drop newest: Reject new messages when the queue is full. The producer gets an error and can retry later. Simple, but the producer must handle rejections.

Drop oldest: Discard the oldest messages to make room for new ones. Useful when freshness matters more than completeness (sensor data, stock prices).

Block producer: The producer blocks until space is available. This propagates backpressure upstream, slowing down the entire pipeline. TCP does this naturally with its send buffer.

Spill to disk: In-memory queues spill to disk when full. Slower, but preserves all messages. Kafka does this by design (all messages are on disk, with the OS page cache providing the speed of in-memory access).

The right strategy depends on the use case. For critical financial transactions, block the producer (never lose a message). For real-time analytics, drop the oldest (freshness over completeness). For general-purpose queues, bounded queues with producer back-off are the safe default.

mode:

click Step or Play to inspect queue state

production stories

Kafka at scale

Kafka was built at LinkedIn to handle the firehose of user activity events: page views, clicks, searches, and ad impressions. The volume was too high for traditional message brokers, and the company needed durable, replayable event streams.

Kafka’s key insight is treating the message log as the primary data structure. Messages are appended to an immutable, ordered log. Consumers track their position (offset) in the log. They can rewind to reprocess historical data or fast-forward to skip ahead. The log is the source of truth, not a temporary buffer.

At LinkedIn’s scale, Kafka processes trillions of messages per day across thousands of brokers. The operational challenges are partition management (rebalancing when brokers join or leave), consumer lag monitoring (detecting when consumers fall behind), and disk management (retention policies that balance storage cost with replayability).

SQS at Amazon

Amazon SQS is the simplest managed queue. No brokers to provision, no partitions to configure, no rebalancing to worry about. You create a queue, send messages, and receive them. SQS scales automatically from zero to millions of messages per second.

SQS Standard queues provide at-least-once delivery with best-effort ordering. FIFO queues provide exactly-once delivery with strict ordering, but at lower throughput (300 messages per second per message group, or 3,000 with batching).

The operational simplicity of SQS is its main advantage. For many workloads (task queues, event processing, decoupling microservices), SQS is the right choice because it eliminates all operational overhead. You only need Kafka-level complexity when you need Kafka-level features (consumer groups, stream processing, log compaction).

RabbitMQ vs Kafka

RabbitMQ and Kafka solve different problems.

RabbitMQ is a message broker. It routes messages from producers to consumers through exchanges and queues. It supports complex routing patterns (fanout, topic, header-based). Messages are deleted after consumption. It excels at task queues and request-reply patterns.

Kafka is a distributed log. It appends messages to partitioned, durable logs. Consumers track their position. Messages persist for a configurable retention period (days, weeks, or forever with compaction). It excels at event streaming, data pipelines, and event sourcing.

Use RabbitMQ when: you need complex routing, message acknowledgment, and traditional queue semantics. Use Kafka when: you need high throughput, message replay, consumer groups, and stream processing.

when queues add unnecessary complexity

Not every service interaction needs a queue. Adding a queue adds latency (the message must be enqueued, then dequeued), operational overhead (monitoring queue depth, consumer lag, dead-letter queues), and debugging complexity (tracing a request through asynchronous hops is harder than tracing a synchronous call).

Use a queue when:

The producer and consumer have different availability requirements
Traffic is bursty and you need to smooth it
Processing is slow and you need to decouple it from the request path
Multiple consumers need the same event

Use synchronous calls when:

The caller needs an immediate response
The operation is fast (under 100ms)
Debugging and tracing simplicity matter more than decoupling
The system is small enough that tight coupling is not a problem

The simplest architecture that solves the problem is the best one. Add queues when synchronous calls become a bottleneck, not before.