Data Management Patterns — Deep Dive

Level: Intermediate
Pre-reading: 03 · Microservices Patterns · 02.04 · Aggregate Design

The Data Challenge in Microservices

Each service should own its data. But how do you query across services? How do you maintain consistency? These patterns address data management in distributed systems.

Database Per Service

Each service has its own database. No shared schemas, no direct database access across service boundaries.

graph TD
    subgraph Order Service
        OS[Order API]
        ODB[(Order DB)]
    end
    subgraph Inventory Service
        IS[Inventory API]
        IDB[(Inventory DB)]
    end
    subgraph Customer Service
        CS[Customer API]
        CDB[(Customer DB)]
    end
    OS --> ODB
    IS --> IDB
    CS --> CDB
    OS -->|API| IS
    OS -->|API| CS

Why Database Per Service?

Benefit	Description
Loose coupling	Schema changes don't affect other services
Technology freedom	Each service picks its optimal database
Independent scaling	Scale storage per service needs
Fault isolation	Database failure affects one service
Team autonomy	Teams control their own data model

Database Technology Selection

Service Need	Database Choice
Transactions, relations	PostgreSQL, MySQL
Document flexibility	MongoDB, DocumentDB
High write throughput	Cassandra, DynamoDB
Caching, sessions	Redis
Full-text search	Elasticsearch
Time series	TimescaleDB, InfluxDB
Graph relationships	Neo4j

Challenges

Challenge	Mitigation
Cross-service queries	API Composition, CQRS
Data consistency	Saga, eventual consistency
Data duplication	Accept it; sync via events
Operational overhead	Managed databases; automation

Shared Database (Anti-Pattern)

Multiple services share a single database. Avoid this.

graph TD
    subgraph Shared DB Anti-Pattern
        OS[Order Service]
        IS[Inventory Service]
        CS[Customer Service]
    end
    DB[(Shared Database)]
    OS --> DB
    IS --> DB
    CS --> DB

Why It's Problematic

Problem	Impact
Schema coupling	Any change requires coordinating all services
Performance coupling	One service's queries affect others
Deployment coupling	Can't deploy services independently
No technology freedom	All services use same database
Hidden dependencies	Services read/write each other's tables

Transition Away

Step	Action
1	Identify table ownership per service
2	Add APIs for cross-service data access
3	Replace direct queries with API calls
4	Separate into distinct schemas/databases

API Composition

Retrieve data from multiple services at query time. The composer (gateway, BFF, or service) calls multiple services and aggregates.

sequenceDiagram
    participant C as Client
    participant A as API Composer
    participant OS as Order Service
    participant CS as Customer Service
    participant PS as Product Service
    C->>A: GET /order-details/123
    par
        A->>OS: GET /orders/123
        A->>CS: GET /customers/456
        A->>PS: GET /products/789
    end
    OS->>A: Order data
    CS->>A: Customer data
    PS->>A: Product data
    A->>A: Compose response
    A->>C: Combined order details

When to Use

Scenario	API Composition Fits
Simple joins	Combining 2–3 services
Read-heavy	Queries more common than writes
Consistency needed	Data must be up-to-date
Low latency services	Backend calls are fast

Challenges

Challenge	Mitigation
Latency	Parallel calls; caching
Partial failures	Circuit breakers; fallbacks
Complexity	Keep compositions simple
Data volume	Pagination; filtering at source

CQRS — Command Query Responsibility Segregation

Separate the write model (commands) from the read model (queries). Each is optimized for its purpose.

graph LR
    C[Client] -->|Commands| WS[Write Service]
    WS --> WDB[(Write DB - Normalized)]
    WS -->|Events| EB[Event Bus]
    EB --> RS[Read Projector]
    RS --> RDB[(Read DB - Denormalized)]
    C -->|Queries| QS[Query Service]
    QS --> RDB

CQRS Components

Component	Purpose
Command Model	Handles creates, updates, deletes
Query Model	Handles reads; optimized projections
Event Publisher	Emits events when write model changes
Projector	Consumes events; updates read model

Read Model Optimization

Query Need	Read Model Design
Dashboard	Pre-aggregated summaries
Search	Elasticsearch index
Reporting	Materialized views
Mobile list	Denormalized, paginated

When to Use CQRS

Good Fit	Bad Fit
High read/write ratio	Simple CRUD
Complex queries across aggregates	Single aggregate queries
Different scaling needs	Uniform load
Team experienced with eventual consistency	Team new to distributed systems

CQRS Trade-offs

Benefit	Cost
Independent read/write scaling	Two models to maintain
Optimized read performance	Eventual consistency
Flexible query patterns	Complexity
Different storage technologies	More infrastructure

→ Deep Dive: CQRS for implementation details

Event Sourcing

Store events as the source of truth. Current state is derived by replaying events.

graph LR
    subgraph Event Store
        E1[OrderCreated]
        E2[ItemAdded]
        E3[ItemAdded]
        E4[OrderShipped]
    end
    ES[Event Stream] --> R[Replay]
    R --> S[Current State]

Event Sourcing Concepts

Concept	Description
Event Store	Append-only log of events
Current State	Derived by replaying events
Snapshot	Periodic state capture to speed replay
Projection	Read model built from events

When to Use Event Sourcing

Good Fit	Bad Fit
Audit trail required	Simple CRUD
Time travel debugging	No replay needs
Complex state machines	Straightforward state
Regulatory compliance	Team unfamiliar with pattern

→ Deep Dive: Event Sourcing for implementation details

Data Replication via Events

Services maintain local copies of data they need, synced via events.

sequenceDiagram
    participant CS as Customer Service
    participant K as Kafka
    participant OS as Order Service
    participant ODB as Order DB
    CS->>K: CustomerUpdated event
    K->>OS: Consume event
    OS->>ODB: Update local customer cache
    Note over OS,ODB: Order Service has local copy of customer data

Why Replicate Data?

Reason	Benefit
Performance	Avoid cross-service calls
Availability	Function when other service is down
Query optimization	Join local data

Replication Trade-offs

Benefit	Cost
Fast local queries	Data may be stale
Decoupled availability	Storage duplication
Simpler queries	Event handling complexity

What to Replicate

Replicate	Don't Replicate
Reference data (product name, customer name)	Transactional data
Data needed for local queries	Frequently changing data
Immutable or slowly changing data	Sensitive data (minimize copies)

Saga Pattern for Distributed Transactions

Coordinate cross-service operations with local transactions and compensating actions.

Step	Forward Action	Compensating Action
1	Reserve inventory	Release reservation
2	Charge payment	Issue refund
3	Create shipment	Cancel shipment

→ Deep Dive: Saga Pattern for choreography vs orchestration

Outbox Pattern

Guarantee atomic database write + event publish using a single transaction.

sequenceDiagram
    participant App
    participant DB
    participant Poller
    participant Kafka
    App->>DB: BEGIN TX
    App->>DB: Insert order
    App->>DB: Insert outbox record
    App->>DB: COMMIT
    Poller->>DB: Poll outbox table
    Poller->>Kafka: Publish event
    Poller->>DB: Mark as published

→ Deep Dive: Outbox Pattern for implementation

Pattern Selection Guide

Need	Pattern
Service owns its data	Database per Service
Query across services	API Composition
High read/write ratio	CQRS
Audit trail, time travel	Event Sourcing
Avoid cross-service calls	Data Replication
Cross-service transactions	Saga
Reliable event publish	Outbox

How do you handle joins across microservices?

Three options: (1) API Composition — call multiple services and join in application. (2) CQRS — build denormalized read models from events. (3) Data Replication — store copies of needed data locally. API Composition is simplest; CQRS is most flexible; Replication is fastest but requires event handling.

When is data duplication across services acceptable?

When the duplicated data is (1) reference data (product names, customer names), (2) slowly changing, (3) needed for query performance, (4) synced via events. Avoid duplicating frequently changing transactional data. Accept eventual consistency. The source of truth publishes events; consumers maintain local copies.

What's the difference between CQRS and Event Sourcing?

CQRS separates read and write models — they can use different databases and schemas. Event Sourcing stores events as the source of truth — state is derived by replay. They're often used together (events feed read model projections) but are independent patterns. You can use CQRS without Event Sourcing (events just sync the read model).

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search