Outbox Pattern Guide — How to Keep Data Consistency in Event-Driven Systems
When a service must both update its database and publish an event, the dual-write problem appears quickly. This post explains why the Outbox Pattern matters, how to design the outbox table, how publisher workers operate, and how to handle retries, duplicates, and production observability.
Why dual write is dangerous
Imagine an order service that must do two things:
- store the order in the database
- publish an
OrderCreatedevent to Kafka
If only one of those succeeds, the system becomes inconsistent.
- database commit succeeds but event publish fails
- event publish succeeds but database commit fails
That is the classic dual-write problem.
Why not use distributed transactions
Distributed transactions are theoretically appealing, but in practice they often add too much coupling and complexity.
Common issues:
- tighter dependency between systems
- harder operations and recovery
- poor fit across heterogeneous databases and brokers
That is why Outbox has become a practical standard in event-driven microservices.
The core idea of the Outbox Pattern
Instead of trying to commit business data and broker publish in one distributed transaction, do this:
- write business data
- write an outbox event row
inside the same local database transaction.
Then a separate publisher process reads the outbox table and publishes the event asynchronously.
Basic flow
Application
-> DB Transaction
-> orders insert
-> outbox insert
-> Commit
Publisher Worker
-> read unpublished outbox rows
-> publish to broker
-> mark as published
The key benefit is that events are not silently lost between business write and publish.
A practical outbox table
Typical columns:
idaggregate_typeaggregate_idevent_typepayloadstatuscreated_atpublished_atretry_count
For example:
CREATE TABLE outbox_events (
id BIGSERIAL PRIMARY KEY,
aggregate_type VARCHAR(100) NOT NULL,
aggregate_id VARCHAR(100) NOT NULL,
event_type VARCHAR(100) NOT NULL,
payload JSONB NOT NULL,
status VARCHAR(20) NOT NULL DEFAULT 'PENDING',
retry_count INT NOT NULL DEFAULT 0,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
published_at TIMESTAMP
);
How should publishing work
Two broad approaches are common.
Polling publisher
- periodically fetch
PENDINGrows - publish to the broker
- mark successful rows as
PUBLISHED
Pros:
- simple and easy to build
Cons:
- introduces small delay
- requires polling management
CDC-based publishing
- use tools like Debezium
- stream outbox changes from the database log
Pros:
- low latency
- highly automated
Cons:
- higher operational complexity
Many teams start with polling and consider CDC later at larger scale.
Duplicate delivery must be expected
Outbox helps prevent event loss, but it does not magically give you exactly-once semantics.
A common scenario:
- publish succeeds
- worker crashes before updating outbox status
On retry, the event may be published again.
That means consumers must be designed for idempotency:
- use idempotency keys
- tolerate duplicate messages
- track already processed event ids when necessary
Operational metrics matter
Outbox is not only an application pattern. It is also an operational pipeline.
Important signals:
- backlog of
PENDINGevents - retry growth
- per-event-type publish failures
- consumer-side delay
If you do not observe those, the outbox can fail silently even when the pattern is “implemented.”
Common mistakes
- creating the outbox but not monitoring it
- no retry policy
- no schema versioning in payload
- assuming consumers will never see duplicates
- no replay plan after publisher failure
When Outbox is especially effective
- order, payment, shipping, and lifecycle events
- systems where the database remains the source of truth
- domains where event loss is unacceptable
If ultra-low latency is critical and your team can handle the complexity, CDC-based designs may be worth the extra effort.
Closing thoughts
The Outbox Pattern is one of the most practical ways to manage data consistency in event-driven systems without relying on fragile distributed transactions.
Its real value comes not only from recording the event, but from building the full operating model around it:
- retries
- duplicate handling
- backlog monitoring
- replay strategy
That is what turns Outbox from a diagram pattern into a production-safe architecture.