Saga and Outbox Patterns — Deep Dive

Level: Advanced
Pre-reading: 03 · Microservices Patterns · 04 · Event-Driven Architecture


The Distributed Transaction Problem

In microservices, operations often span multiple services. Traditional ACID transactions don't work across service boundaries. 2PC (Two-Phase Commit) blocks and doesn't scale.

Solution: The Saga pattern — a sequence of local transactions coordinated by events.


Saga Pattern

A saga is a sequence of local transactions. Each step has a compensating transaction to undo if a later step fails.

graph LR
    T1[Local TX 1] --> T2[Local TX 2]
    T2 --> T3[Local TX 3]
    T3 -->|Success| S[Complete]
    T3 -->|Failure| C3[Compensate T2]
    C3 --> C2[Compensate T1]
    C2 --> F[Aborted]

Saga Example: Order Placement

Step Service Forward Action Compensating Action
1 Order Service Create order (PENDING) Cancel order
2 Inventory Service Reserve items Release reservation
3 Payment Service Charge payment Issue refund
4 Order Service Confirm order (CONFIRMED)

Saga Coordination: Choreography vs Orchestration


Choreography — Event-Driven

Each service publishes events; other services react. No central coordinator.

sequenceDiagram
    participant OS as Order Service
    participant K as Kafka
    participant IS as Inventory Service
    participant PS as Payment Service

    OS->>K: OrderCreated
    K->>IS: [OrderCreated]
    IS->>IS: Reserve items
    IS->>K: InventoryReserved
    K->>PS: [InventoryReserved]
    PS->>PS: Charge payment
    PS->>K: PaymentCharged
    K->>OS: [PaymentCharged]
    OS->>OS: Confirm order

Choreography: Failure Flow

sequenceDiagram
    participant OS as Order Service
    participant K as Kafka
    participant IS as Inventory Service
    participant PS as Payment Service

    OS->>K: OrderCreated
    K->>IS: [OrderCreated]
    IS->>IS: Reserve items
    IS->>K: InventoryReserved
    K->>PS: [InventoryReserved]
    PS->>PS: Payment fails
    PS->>K: PaymentFailed
    K->>IS: [PaymentFailed]
    IS->>IS: Release reservation
    K->>OS: [PaymentFailed]
    OS->>OS: Cancel order

Choreography Characteristics

Aspect Detail
Coupling Loose; services don't know each other
Visibility Hard to see full saga flow
Complexity Grows with number of steps
Failure handling Each service publishes compensating events
Best for Simple sagas (2–4 steps)

Orchestration — Central Coordinator

A saga orchestrator directs the flow. It tells each service what to do and handles responses.

sequenceDiagram
    participant O as Saga Orchestrator
    participant IS as Inventory Service
    participant PS as Payment Service
    participant SS as Shipping Service

    O->>IS: ReserveInventory
    IS->>O: InventoryReserved
    O->>PS: ChargePayment
    PS->>O: PaymentCharged
    O->>SS: CreateShipment
    SS->>O: ShipmentCreated
    O->>O: Saga complete

Orchestration: Failure Flow

sequenceDiagram
    participant O as Saga Orchestrator
    participant IS as Inventory Service
    participant PS as Payment Service

    O->>IS: ReserveInventory
    IS->>O: InventoryReserved
    O->>PS: ChargePayment
    PS->>O: PaymentFailed
    O->>IS: ReleaseReservation
    IS->>O: ReservationReleased
    O->>O: Saga aborted

Orchestration Characteristics

Aspect Detail
Coupling Orchestrator knows all participants
Visibility Clear view of saga state
Complexity Centralized; easier to manage
Failure handling Orchestrator manages rollback
Best for Complex sagas (5+ steps)

Orchestration Tools

Tool Description
Temporal.io Durable workflow engine; code-based
AWS Step Functions Serverless state machine
Camunda BPMN-based workflow
Axon Framework Java-based; saga + event sourcing
Netflix Conductor Microservices workflow orchestration

Choreography vs Orchestration

Aspect Choreography Orchestration
Coupling Lower Higher (to orchestrator)
Visibility Harder to trace Central dashboard
Single point of failure None Orchestrator
Complexity Distributed Centralized
Testing Harder Easier
Best for Simple flows Complex flows

When to Use Which

Scenario Recommendation
2–4 steps, simple flow Choreography
5+ steps Orchestration
Need workflow visibility Orchestration
Team familiar with events Choreography
Complex branching logic Orchestration
Minimal infrastructure Choreography

Compensating Transactions

Compensations undo the effects of forward transactions. They must be idempotent and safe to execute multiple times.

Compensation Design

Forward Action Compensation Notes
Create order Mark order cancelled Don't delete; keep audit trail
Reserve inventory Release reservation Restore available count
Charge payment Issue refund Stripe/PayPal refund API
Send email Send cancellation email Can't unsend; send follow-up
Create shipment Cancel shipment May not be possible if shipped

Semantic Compensation

Some actions can't be truly undone. Use semantic compensation — a corrective action.

Action Semantic Compensation
Send email Send "sorry, please disregard" email
Ship package Send return label; arrange pickup
Post to timeline Post correction or delete

Saga State Management

Choreography State

Each service maintains its own state. Events carry enough context.

// Order service
@EventListener
public void on(PaymentFailed event) {
    Order order = orderRepository.findById(event.orderId());
    order.cancel(event.reason());
    orderRepository.save(order);
}

Orchestration State

Orchestrator tracks saga state explicitly.

public class OrderSaga {
    private String sagaId;
    private OrderId orderId;
    private SagaStatus status;  // STARTED, INVENTORY_RESERVED, PAYMENT_CHARGED, COMPLETED, COMPENSATING, ABORTED
    private List<String> completedSteps;
}

The Outbox Pattern

The Dual-Write Problem

Writing to DB and publishing an event are two operations. Either can fail independently.

sequenceDiagram
    participant App
    participant DB
    participant Kafka
    App->>DB: Insert order ✓
    App->>Kafka: Publish OrderCreated ✗
    Note over App,Kafka: Order saved but event lost!

Solution: Outbox Table

Write both the business data and the event to the database in the same transaction. A separate process publishes events.

sequenceDiagram
    participant App
    participant DB
    participant Poller
    participant Kafka
    App->>DB: BEGIN TX
    App->>DB: INSERT INTO orders
    App->>DB: INSERT INTO outbox
    App->>DB: COMMIT
    Poller->>DB: SELECT FROM outbox WHERE published = false
    Poller->>Kafka: Publish event
    Poller->>DB: UPDATE outbox SET published = true

Outbox Table Schema

CREATE TABLE outbox (
    id UUID PRIMARY KEY,
    aggregate_type VARCHAR(255),    -- 'Order'
    aggregate_id VARCHAR(255),      -- '12345'
    event_type VARCHAR(255),        -- 'OrderCreated'
    payload JSONB,                  -- Event data
    created_at TIMESTAMP,
    published BOOLEAN DEFAULT FALSE
);

Outbox Implementation

@Transactional
public void placeOrder(PlaceOrderCommand cmd) {
    // Business logic
    Order order = Order.create(cmd);
    orderRepository.save(order);

    // Write to outbox (same transaction)
    OutboxEvent event = new OutboxEvent(
        "Order",
        order.getId().toString(),
        "OrderCreated",
        serialize(order.toOrderCreatedEvent())
    );
    outboxRepository.save(event);
}

// Separate poller process
@Scheduled(fixedDelay = 100)
public void publishOutboxEvents() {
    List<OutboxEvent> pending = outboxRepository.findUnpublished();
    for (OutboxEvent event : pending) {
        kafkaTemplate.send(event.getTopic(), event.getPayload());
        event.markPublished();
        outboxRepository.save(event);
    }
}

Change Data Capture (CDC) Alternative

Instead of polling, use CDC tools to capture database changes and publish events.

graph LR
    DB[(Database)] --> CDC[Debezium CDC]
    CDC --> K[Kafka]
Tool Description
Debezium OSS CDC; PostgreSQL, MySQL, MongoDB
AWS DMS Managed CDC; RDS to Kinesis
Spring Modulith Outbox support built-in

Outbox Benefits

Benefit Description
Atomicity DB write and event are atomic
At-least-once Events guaranteed to be published
Order preserved Events processed in order
No distributed transactions Single DB transaction

Outbox Guarantees

Guarantee How
At-least-once delivery Retry until published
Ordering Process outbox in order
No loss Event in DB survives crashes

Idempotent consumers required

Outbox guarantees at-least-once, not exactly-once. Consumers must handle duplicates.


When should you use choreography vs orchestration for sagas?

Use choreography for simple sagas (2–4 steps) with clear event flows and a team comfortable with event-driven design. Use orchestration for complex sagas (5+ steps), when you need visibility into saga state, or when branching/conditional logic is involved. Orchestration is easier to test and debug.

What guarantees does the outbox pattern provide?

At-least-once delivery: Events will be published (eventually). Ordering: Events are published in the order written. No loss: Events persist in DB; survive crashes. It does NOT guarantee exactly-once — consumers must be idempotent. The outbox pattern trades simplicity for reliability.

How do you handle a compensation that fails?

(1) Retry with exponential backoff. (2) Dead letter queue for events that exceed retries. (3) Manual intervention via ops dashboard. (4) Reconciliation job that detects inconsistencies. Compensations must be idempotent so retries are safe. Design compensations to eventually succeed.