20 · Saga Pattern — Compensating Distributed Transactions
Distributed Transactions · Topic 20 of 20
The Problem with 2PC at Scale
Two-Phase Commit holds locks and blocks on coordinator failure — unacceptable for long-running business workflows across microservices.
The Saga pattern breaks a distributed transaction into a sequence of local transactions, each with a corresponding compensating transaction that can undo its effect.
How Saga Works
Each step is a local DB transaction. If a step fails, compensating transactions run in reverse to undo the completed steps.
graph LR
T1["T1: Place Order\n(Compensate: Cancel Order)"]
T2["T2: Reserve Inventory\n(Compensate: Release Inventory)"]
T3["T3: Charge Payment\n(Compensate: Refund)"]
T4["T4: Ship Order\n(Compensate: Recall Shipment)"]
T1 --> T2 --> T3 --> T4
If T3 fails: run C3 (Refund) → C2 (Release Inventory) → C1 (Cancel Order)
Saga Coordination Models
Choreography
Each service publishes events and reacts to events from other services. No central coordinator.
- ✅ Decoupled, no single point of failure
- ❌ Hard to trace end-to-end flow; risk of cyclic dependencies
Orchestration
A Saga orchestrator service explicitly tells each participant what to do.
- ✅ Centralized visibility and control
- ❌ Orchestrator is a new service to maintain
sequenceDiagram
Orchestrator->>OrderService: Place Order
Orchestrator->>InventoryService: Reserve Inventory
Orchestrator->>PaymentService: Charge Payment
PaymentService-->>Orchestrator: FAILED
Orchestrator->>InventoryService: Release Inventory (compensate)
Orchestrator->>OrderService: Cancel Order (compensate)
Idempotency is Required
Since messages can be retried, every step must be idempotent: processing the same message twice must produce the same result.
-- Idempotent reservation
INSERT INTO reservations (id, item_id, qty, saga_id)
VALUES (?, ?, ?, ?)
ON CONFLICT (saga_id) DO NOTHING;
Cloud Implementations
- Native saga/orchestration engine
- State machine DSL; retries and error handling built-in
- Integrates with DynamoDB, Lambda, SQS, SNS
- Orchestration service for multi-step GCP workflows
- Compensation steps defined as on-error branches
- Each service writes events to its local DB (outbox table) atomically with business data
- CDC (Debezium) reads outbox and publishes to Kafka
- Next service consumes and processes
- Write business record + event atomically with DynamoDB Transactions
- Lambda processes SQS events per saga step
Saga vs 2PC
| 2PC | Saga | |
|---|---|---|
| Locks | Held across all participants | None (local txns only) |
| Failure | Blocking | Compensating transactions |
| Atomicity | True atomic | Eventual (compensations) |
| Complexity | Protocol complexity | Compensation logic complexity |
| Latency | High | Lower per step |
| Best for | Short OLTP transactions | Long-running business workflows |