Skip to content

20 · Saga Pattern — Compensating Distributed Transactions

Distributed Transactions · Topic 20 of 20


The Problem with 2PC at Scale

Two-Phase Commit holds locks and blocks on coordinator failure — unacceptable for long-running business workflows across microservices.

The Saga pattern breaks a distributed transaction into a sequence of local transactions, each with a corresponding compensating transaction that can undo its effect.


How Saga Works

Each step is a local DB transaction. If a step fails, compensating transactions run in reverse to undo the completed steps.

graph LR
    T1["T1: Place Order\n(Compensate: Cancel Order)"]
    T2["T2: Reserve Inventory\n(Compensate: Release Inventory)"]
    T3["T3: Charge Payment\n(Compensate: Refund)"]
    T4["T4: Ship Order\n(Compensate: Recall Shipment)"]

    T1 --> T2 --> T3 --> T4

If T3 fails: run C3 (Refund) → C2 (Release Inventory) → C1 (Cancel Order)


Saga Coordination Models

Choreography

Each service publishes events and reacts to events from other services. No central coordinator.

  • ✅ Decoupled, no single point of failure
  • ❌ Hard to trace end-to-end flow; risk of cyclic dependencies

Orchestration

A Saga orchestrator service explicitly tells each participant what to do.

  • ✅ Centralized visibility and control
  • ❌ Orchestrator is a new service to maintain
sequenceDiagram
    Orchestrator->>OrderService: Place Order
    Orchestrator->>InventoryService: Reserve Inventory
    Orchestrator->>PaymentService: Charge Payment
    PaymentService-->>Orchestrator: FAILED
    Orchestrator->>InventoryService: Release Inventory (compensate)
    Orchestrator->>OrderService: Cancel Order (compensate)

Idempotency is Required

Since messages can be retried, every step must be idempotent: processing the same message twice must produce the same result.

-- Idempotent reservation
INSERT INTO reservations (id, item_id, qty, saga_id)
VALUES (?, ?, ?, ?)
ON CONFLICT (saga_id) DO NOTHING;

Cloud Implementations

  • Native saga/orchestration engine
  • State machine DSL; retries and error handling built-in
  • Integrates with DynamoDB, Lambda, SQS, SNS
  • Orchestration service for multi-step GCP workflows
  • Compensation steps defined as on-error branches
  • Each service writes events to its local DB (outbox table) atomically with business data
  • CDC (Debezium) reads outbox and publishes to Kafka
  • Next service consumes and processes
  • Write business record + event atomically with DynamoDB Transactions
  • Lambda processes SQS events per saga step
BEGIN;
  INSERT INTO orders (...) VALUES (...);
  INSERT INTO outbox (event_type, payload) VALUES ('OrderPlaced', '...');
COMMIT;
-- Debezium/CDC picks up outbox row and publishes to Kafka

Saga vs 2PC

2PC Saga
Locks Held across all participants None (local txns only)
Failure Blocking Compensating transactions
Atomicity True atomic Eventual (compensations)
Complexity Protocol complexity Compensation logic complexity
Latency High Lower per step
Best for Short OLTP transactions Long-running business workflows