Saga Pattern — Deep Dive
Level: Advanced
Pre-reading: 04 · Event-Driven Architecture · 03.04 · Saga and Outbox Patterns
The Problem: Distributed Transactions
Microservices each have their own database. Traditional ACID transactions don't work across service boundaries. Two-Phase Commit (2PC) blocks and reduces availability.
Solution: The Saga pattern — a sequence of local transactions coordinated by messages.
Saga Definition
A saga is a sequence of local transactions where:
- Each step is a local ACID transaction
- Each step publishes events/commands to trigger the next step
- Each step has a compensating transaction to undo on failure
graph LR
T1[Step 1: Reserve Inventory]
T2[Step 2: Charge Payment]
T3[Step 3: Create Shipment]
T1 --> T2 --> T3
T3 -->|Failure| C2[Compensate: Refund]
C2 --> C1[Compensate: Release Inventory]
Choreography vs Orchestration
Choreography
Services communicate via events. Each service decides what to do next based on events it receives.
sequenceDiagram
participant OS as Order Service
participant IS as Inventory Service
participant PS as Payment Service
participant SS as Shipping Service
OS->>IS: OrderCreated
IS->>IS: Reserve inventory
IS->>PS: InventoryReserved
PS->>PS: Charge payment
PS->>SS: PaymentSucceeded
SS->>SS: Create shipment
SS->>OS: ShipmentCreated
OS->>OS: Confirm order
Orchestration
A central saga orchestrator tells each service what to do and handles responses.
sequenceDiagram
participant O as Saga Orchestrator
participant IS as Inventory Service
participant PS as Payment Service
participant SS as Shipping Service
O->>IS: ReserveInventory
IS->>O: InventoryReserved
O->>PS: ChargePayment
PS->>O: PaymentSucceeded
O->>SS: CreateShipment
SS->>O: ShipmentCreated
O->>O: Saga complete
Comparison
| Aspect | Choreography | Orchestration |
|---|---|---|
| Coupling | Loose; services independent | Tighter; orchestrator knows all |
| Visibility | Hard to see full flow | Clear in orchestrator |
| Single point of failure | None | Orchestrator |
| Complexity | Distributed; harder to trace | Centralized; easier to manage |
| Testing | Harder | Easier |
| Best for | Simple flows (2-4 steps) | Complex flows (5+ steps) |
Compensating Transactions
Compensations undo the effects of a step. They're not rollbacks — they're new transactions that reverse the business effect.
Compensation Design
| Forward Action | Compensation | Considerations |
|---|---|---|
| Create order | Cancel order | Don't delete; mark cancelled |
| Reserve inventory | Release inventory | Handle partial reservations |
| Charge payment | Issue refund | May take days to process |
| Send confirmation email | Send cancellation email | Can't unsend; send follow-up |
| Create shipment | Cancel shipment | May not be possible if shipped |
Compensation Rules
| Rule | Rationale |
|---|---|
| Idempotent | Safe to execute multiple times |
| Eventually succeed | Retry until done |
| Semantic reversal | May not be exact undo |
| Order matters | Compensate in reverse order |
Saga State Machine
A saga transitions through states:
stateDiagram-v2
[*] --> Started
Started --> InventoryReserving
InventoryReserving --> InventoryReserved: Success
InventoryReserving --> Compensating: Failure
InventoryReserved --> PaymentCharging
PaymentCharging --> PaymentCharged: Success
PaymentCharging --> Compensating: Failure
PaymentCharged --> ShipmentCreating
ShipmentCreating --> Completed: Success
ShipmentCreating --> Compensating: Failure
Compensating --> Aborted
Completed --> [*]
Aborted --> [*]
State Persistence
public class OrderSaga {
private String sagaId;
private String orderId;
private SagaState state;
private List<SagaStep> completedSteps;
private String failureReason;
private Instant startedAt;
private Instant completedAt;
public enum SagaState {
STARTED,
INVENTORY_RESERVING,
INVENTORY_RESERVED,
PAYMENT_CHARGING,
PAYMENT_CHARGED,
SHIPMENT_CREATING,
COMPLETED,
COMPENSATING,
ABORTED
}
}
Orchestration Implementation
Orchestrator Service
@Service
public class OrderSagaOrchestrator {
private final SagaRepository sagaRepository;
private final InventoryClient inventoryClient;
private final PaymentClient paymentClient;
private final ShippingClient shippingClient;
public void start(PlaceOrderCommand command) {
OrderSaga saga = OrderSaga.create(command);
sagaRepository.save(saga);
executeStep(saga, this::reserveInventory);
}
private void reserveInventory(OrderSaga saga) {
try {
inventoryClient.reserve(saga.getOrderId(), saga.getItems());
saga.inventoryReserved();
executeStep(saga, this::chargePayment);
} catch (Exception e) {
saga.startCompensation(e.getMessage());
compensate(saga);
}
}
private void chargePayment(OrderSaga saga) {
try {
paymentClient.charge(saga.getOrderId(), saga.getAmount());
saga.paymentCharged();
executeStep(saga, this::createShipment);
} catch (Exception e) {
saga.startCompensation(e.getMessage());
compensate(saga);
}
}
private void compensate(OrderSaga saga) {
for (SagaStep step : saga.getCompletedStepsReversed()) {
executeCompensation(step);
}
saga.abort();
sagaRepository.save(saga);
}
}
Async Orchestrator with Events
@Component
public class OrderSagaEventHandler {
private final SagaRepository sagaRepository;
private final CommandGateway commandGateway;
@EventHandler
public void on(InventoryReserved event) {
OrderSaga saga = sagaRepository.findByOrderId(event.orderId());
saga.inventoryReserved();
commandGateway.send(new ChargePaymentCommand(
saga.getOrderId(),
saga.getAmount()
));
sagaRepository.save(saga);
}
@EventHandler
public void on(PaymentFailed event) {
OrderSaga saga = sagaRepository.findByOrderId(event.orderId());
saga.startCompensation(event.reason());
commandGateway.send(new ReleaseInventoryCommand(saga.getOrderId()));
sagaRepository.save(saga);
}
}
Choreography Implementation
Service Reacting to Events
// Inventory Service
@Component
public class InventoryEventHandler {
@EventListener
public void on(OrderCreated event) {
try {
inventory.reserve(event.orderId(), event.items());
eventPublisher.publish(new InventoryReserved(event.orderId()));
} catch (InsufficientStockException e) {
eventPublisher.publish(new InventoryReservationFailed(event.orderId(), e.getMessage()));
}
}
@EventListener
public void on(PaymentFailed event) {
inventory.release(event.orderId());
eventPublisher.publish(new InventoryReleased(event.orderId()));
}
}
// Payment Service
@Component
public class PaymentEventHandler {
@EventListener
public void on(InventoryReserved event) {
try {
payment.charge(event.orderId());
eventPublisher.publish(new PaymentSucceeded(event.orderId()));
} catch (PaymentException e) {
eventPublisher.publish(new PaymentFailed(event.orderId(), e.getMessage()));
}
}
}
Saga Frameworks and Tools
| Tool | Type | Description |
|---|---|---|
| Temporal.io | Orchestration | Durable workflows; code-based; handles failures |
| AWS Step Functions | Orchestration | Serverless; state machine; AWS native |
| Camunda | Orchestration | BPMN-based; visual designer |
| Axon Framework | Both | Java; saga + event sourcing |
| Netflix Conductor | Orchestration | JSON-based workflow definition |
| Eventuate Tram | Choreography | Java; outbox-based messaging |
Temporal Example
@WorkflowInterface
public interface OrderSagaWorkflow {
@WorkflowMethod
void execute(PlaceOrderCommand command);
}
@WorkflowImplementation
public class OrderSagaWorkflowImpl implements OrderSagaWorkflow {
private final InventoryActivity inventory = Workflow.newActivityStub(InventoryActivity.class);
private final PaymentActivity payment = Workflow.newActivityStub(PaymentActivity.class);
private final ShippingActivity shipping = Workflow.newActivityStub(ShippingActivity.class);
@Override
public void execute(PlaceOrderCommand command) {
Saga.Options options = new Saga.Options.Builder().build();
Saga saga = new Saga(options);
try {
saga.addCompensation(() -> inventory.release(command.orderId()));
inventory.reserve(command.orderId(), command.items());
saga.addCompensation(() -> payment.refund(command.orderId()));
payment.charge(command.orderId(), command.amount());
shipping.create(command.orderId());
} catch (Exception e) {
saga.compensate();
throw e;
}
}
}
Saga Failure Scenarios
Happy Path
All steps succeed.
Step Failure
A step fails; compensate completed steps in reverse order.
Compensation Failure
graph TD
F[Step 3 fails]
F --> C2[Compensate Step 2]
C2 --> CF[Compensation fails!]
CF --> R[Retry with backoff]
R --> C2
R --> DLQ[After max retries: Dead Letter]
DLQ --> Alert[Alert ops]
Network Partition
Message lost; use idempotency and timeouts.
Best Practices
| Practice | Rationale |
|---|---|
| Idempotent steps | Safe to retry on failure |
| Timeout each step | Don't wait forever |
| Store saga state | Recover after crash |
| Unique saga ID | Trace full flow |
| Dead letter handling | Catch unprocessable events |
| Monitoring | Dashboard for saga states |
When should you use choreography vs orchestration?
Use choreography for simple flows (2-4 steps) with clear event chains and teams comfortable with event-driven design. Use orchestration for complex flows (5+ steps), when you need visibility into saga state, or when business logic has branching/conditional paths. Orchestration is easier to test and debug.
How do you handle a saga that gets stuck?
(1) Timeouts: Each step has a timeout; trigger compensation if exceeded. (2) Monitoring: Dashboard shows stuck sagas. (3) Manual intervention: Ops UI to force complete or abort. (4) Reconciliation: Periodic job checks for stuck sagas. (5) Dead letter queue: Capture and alert on stuck compensations.
What if a compensating transaction fails?
(1) Retry with exponential backoff. (2) Dead letter queue after max retries. (3) Alert ops for manual intervention. (4) Reconciliation job to detect and fix inconsistencies. Compensations must be designed to eventually succeed — they're idempotent and retryable.