Outbox Pattern — Deep Dive
Level: Intermediate
Pre-reading: 04 · Event-Driven Architecture · 03.04 · Saga and Outbox Patterns
The Dual-Write Problem
When a service needs to update its database and publish an event, two things can go wrong:
sequenceDiagram
participant App
participant DB
participant Kafka
App->>DB: INSERT order ✓
App->>Kafka: Publish OrderCreated ✗
Note over App,Kafka: Database updated but event lost!
sequenceDiagram
participant App
participant DB
participant Kafka
App->>Kafka: Publish OrderCreated ✓
App->>DB: INSERT order ✗
Note over App,Kafka: Event published but database failed!
Both scenarios leave the system in an inconsistent state.
The Outbox Solution
Write both the business data and the event to the same database in a single transaction. A separate process reads the outbox and publishes events.
sequenceDiagram
participant App
participant DB
participant Relay
participant Kafka
App->>DB: BEGIN TX
App->>DB: INSERT INTO orders
App->>DB: INSERT INTO outbox
App->>DB: COMMIT
Note over App,DB: Single atomic transaction
Relay->>DB: SELECT FROM outbox WHERE published = false
Relay->>Kafka: Publish event
Relay->>DB: UPDATE outbox SET published = true
Outbox Table Design
CREATE TABLE outbox (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
aggregate_type VARCHAR(255) NOT NULL, -- 'Order'
aggregate_id VARCHAR(255) NOT NULL, -- Order ID
event_type VARCHAR(255) NOT NULL, -- 'OrderCreated'
payload JSONB NOT NULL, -- Serialized event
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
published_at TIMESTAMP,
published BOOLEAN NOT NULL DEFAULT FALSE
);
CREATE INDEX idx_outbox_unpublished ON outbox (created_at) WHERE published = FALSE;
Outbox Record
public record OutboxRecord(
UUID id,
String aggregateType,
String aggregateId,
String eventType,
String payload,
Instant createdAt,
Instant publishedAt,
boolean published
) {}
Writing to the Outbox
Write business data and outbox record in the same transaction:
@Service
public class OrderService {
private final OrderRepository orderRepository;
private final OutboxRepository outboxRepository;
@Transactional
public Order placeOrder(PlaceOrderCommand command) {
// Business logic
Order order = Order.create(command);
orderRepository.save(order);
// Write to outbox (same transaction)
OrderCreatedEvent event = new OrderCreatedEvent(
order.getId(),
order.getCustomerId(),
order.getTotalAmount(),
order.getLines()
);
outboxRepository.save(new OutboxRecord(
UUID.randomUUID(),
"Order",
order.getId().toString(),
"OrderCreated",
serialize(event),
Instant.now(),
null,
false
));
return order;
}
}
Publishing from the Outbox
Option 1: Polling
A scheduled job polls the outbox table and publishes events.
@Component
public class OutboxRelay {
private final OutboxRepository outboxRepository;
private final KafkaTemplate<String, String> kafkaTemplate;
@Scheduled(fixedDelay = 100) // Poll every 100ms
@Transactional
public void processOutbox() {
List<OutboxRecord> records = outboxRepository.findUnpublished(100);
for (OutboxRecord record : records) {
String topic = record.aggregateType().toLowerCase() + "-events";
kafkaTemplate.send(topic, record.aggregateId(), record.payload())
.whenComplete((result, error) -> {
if (error == null) {
markPublished(record);
} else {
log.error("Failed to publish: {}", record.id(), error);
}
});
}
}
@Transactional
void markPublished(OutboxRecord record) {
outboxRepository.markPublished(record.id(), Instant.now());
}
}
Option 2: Change Data Capture (CDC)
Use Debezium to capture database changes and stream them to Kafka.
graph LR
DB[(Database)] --> CDC[Debezium CDC]
CDC --> K[Kafka]
K --> Router[Event Router]
Router --> T1[orders-events topic]
Router --> T2[payments-events topic]
Debezium advantages:
- No polling; real-time
- Captures all changes (even from other apps)
- Scales with database replication
Debezium connector config:
{
"name": "outbox-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "postgres",
"database.port": "5432",
"database.user": "debezium",
"database.password": "secret",
"database.dbname": "orders",
"table.include.list": "public.outbox",
"transforms": "outbox",
"transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
"transforms.outbox.table.fields.additional.placement": "type:header:eventType"
}
}
Outbox with Spring Modulith
Spring Modulith has built-in outbox support:
@Configuration
public class ModulithConfig {
@Bean
public EventExternalizationConfiguration eventExternalization(
KafkaOperations<String, Object> kafka) {
return EventExternalizationConfiguration.externalizing()
.select(EventExternalizationConfiguration.annotatedAsExternalized())
.route(OrderCreated.class, e -> RoutingTarget.forTarget("orders-events"))
.build();
}
}
// Events marked for externalization
@Externalized
public record OrderCreated(OrderId orderId, CustomerId customerId) {}
Delivery Guarantees
The outbox pattern provides at-least-once delivery:
| Guarantee | How |
|---|---|
| No loss | Event persisted in DB before publish |
| Eventual delivery | Relay retries until published |
| May duplicate | If publish succeeds but mark fails |
Handling Duplicates
Consumers must be idempotent:
@Component
public class OrderEventConsumer {
private final ProcessedEventRepository processedEvents;
@KafkaListener(topics = "orders-events")
public void consume(OrderCreatedEvent event) {
// Idempotency check
if (processedEvents.exists(event.eventId())) {
log.debug("Duplicate event: {}", event.eventId());
return;
}
// Process event
inventoryService.reserve(event.orderId(), event.items());
// Mark as processed
processedEvents.save(event.eventId());
}
}
Outbox Cleanup
Old outbox records accumulate. Clean them up:
@Scheduled(cron = "0 0 2 * * *") // Daily at 2 AM
@Transactional
public void cleanupOutbox() {
Instant threshold = Instant.now().minus(7, ChronoUnit.DAYS);
outboxRepository.deletePublishedBefore(threshold);
}
Or partition the table by date for efficient drops.
Ordering Guarantees
Per-Aggregate Ordering
Use aggregate ID as Kafka partition key:
Events for the same aggregate go to the same partition → ordered.
Global Ordering
Not guaranteed. If you need global order, use a single partition (but lose parallelism).
Outbox Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| No cleanup | Table grows forever | Scheduled cleanup job |
| Synchronous publish | Defeats the purpose | Always async; relay or CDC |
| No idempotency | Duplicate processing | Idempotent consumers |
| Business logic in relay | Relay becomes bottleneck | Keep relay simple |
| No monitoring | Don't know if events stuck | Alert on unpublished age |
Polling vs CDC
| Aspect | Polling | CDC (Debezium) |
|---|---|---|
| Latency | 100ms+ (poll interval) | Milliseconds |
| Complexity | Simple | More infrastructure |
| Database load | Polling queries | Replication slot |
| Scalability | Limited | High |
| Captures all changes | Only outbox | Any table |
Monitoring
| Metric | Alert Condition |
|---|---|
| Unpublished count | > 1000 (backlog growing) |
| Oldest unpublished | > 5 minutes |
| Publish failures | > 0 sustained |
| Consumer lag | Growing |
-- Monitoring query
SELECT
COUNT(*) as unpublished_count,
MIN(created_at) as oldest_unpublished,
EXTRACT(EPOCH FROM (NOW() - MIN(created_at))) as oldest_age_seconds
FROM outbox
WHERE published = FALSE;
What guarantees does the outbox pattern provide?
At-least-once delivery: Events will be published (eventually). No loss: Events are persisted in the DB before publishing. Ordering per aggregate: If using aggregate ID as partition key. It does NOT guarantee exactly-once — consumers must be idempotent. The pattern trades some latency for reliability.
Should I use polling or CDC for the outbox?
Polling is simpler to set up and has no additional infrastructure. Good for moderate volumes and when 100ms+ latency is acceptable. CDC (Debezium) has lower latency (milliseconds), scales better, and captures changes reliably. Use CDC for high-volume systems or when you already have Kafka Connect infrastructure.
How do you handle a backlog of unpublished events?
(1) Scale the relay: More instances, parallel processing. (2) Increase batch size: Process more records per poll. (3) Prioritize: Critical events first. (4) Alert early: Monitor unpublished age and count. (5) Investigate root cause: Is Kafka down? Network issues? Fix the underlying problem.