Data Management Patterns — Deep Dive

Level: Intermediate
Pre-reading: 03 · Microservices Patterns · 02.04 · Aggregate Design


The Data Challenge in Microservices

Each service should own its data. But how do you query across services? How do you maintain consistency? These patterns address data management in distributed systems.


Database Per Service

Each service has its own database. No shared schemas, no direct database access across service boundaries.

graph TD
    subgraph Order Service
        OS[Order API]
        ODB[(Order DB)]
    end
    subgraph Inventory Service
        IS[Inventory API]
        IDB[(Inventory DB)]
    end
    subgraph Customer Service
        CS[Customer API]
        CDB[(Customer DB)]
    end
    OS --> ODB
    IS --> IDB
    CS --> CDB
    OS -->|API| IS
    OS -->|API| CS

Why Database Per Service?

Benefit Description
Loose coupling Schema changes don't affect other services
Technology freedom Each service picks its optimal database
Independent scaling Scale storage per service needs
Fault isolation Database failure affects one service
Team autonomy Teams control their own data model

Database Technology Selection

Service Need Database Choice
Transactions, relations PostgreSQL, MySQL
Document flexibility MongoDB, DocumentDB
High write throughput Cassandra, DynamoDB
Caching, sessions Redis
Full-text search Elasticsearch
Time series TimescaleDB, InfluxDB
Graph relationships Neo4j

Challenges

Challenge Mitigation
Cross-service queries API Composition, CQRS
Data consistency Saga, eventual consistency
Data duplication Accept it; sync via events
Operational overhead Managed databases; automation

Shared Database (Anti-Pattern)

Multiple services share a single database. Avoid this.

graph TD
    subgraph Shared DB Anti-Pattern
        OS[Order Service]
        IS[Inventory Service]
        CS[Customer Service]
    end
    DB[(Shared Database)]
    OS --> DB
    IS --> DB
    CS --> DB

Why It's Problematic

Problem Impact
Schema coupling Any change requires coordinating all services
Performance coupling One service's queries affect others
Deployment coupling Can't deploy services independently
No technology freedom All services use same database
Hidden dependencies Services read/write each other's tables

Transition Away

Step Action
1 Identify table ownership per service
2 Add APIs for cross-service data access
3 Replace direct queries with API calls
4 Separate into distinct schemas/databases

API Composition

Retrieve data from multiple services at query time. The composer (gateway, BFF, or service) calls multiple services and aggregates.

sequenceDiagram
    participant C as Client
    participant A as API Composer
    participant OS as Order Service
    participant CS as Customer Service
    participant PS as Product Service
    C->>A: GET /order-details/123
    par
        A->>OS: GET /orders/123
        A->>CS: GET /customers/456
        A->>PS: GET /products/789
    end
    OS->>A: Order data
    CS->>A: Customer data
    PS->>A: Product data
    A->>A: Compose response
    A->>C: Combined order details

When to Use

Scenario API Composition Fits
Simple joins Combining 2–3 services
Read-heavy Queries more common than writes
Consistency needed Data must be up-to-date
Low latency services Backend calls are fast

Challenges

Challenge Mitigation
Latency Parallel calls; caching
Partial failures Circuit breakers; fallbacks
Complexity Keep compositions simple
Data volume Pagination; filtering at source

CQRS — Command Query Responsibility Segregation

Separate the write model (commands) from the read model (queries). Each is optimized for its purpose.

graph LR
    C[Client] -->|Commands| WS[Write Service]
    WS --> WDB[(Write DB - Normalized)]
    WS -->|Events| EB[Event Bus]
    EB --> RS[Read Projector]
    RS --> RDB[(Read DB - Denormalized)]
    C -->|Queries| QS[Query Service]
    QS --> RDB

CQRS Components

Component Purpose
Command Model Handles creates, updates, deletes
Query Model Handles reads; optimized projections
Event Publisher Emits events when write model changes
Projector Consumes events; updates read model

Read Model Optimization

Query Need Read Model Design
Dashboard Pre-aggregated summaries
Search Elasticsearch index
Reporting Materialized views
Mobile list Denormalized, paginated

When to Use CQRS

Good Fit Bad Fit
High read/write ratio Simple CRUD
Complex queries across aggregates Single aggregate queries
Different scaling needs Uniform load
Team experienced with eventual consistency Team new to distributed systems

CQRS Trade-offs

Benefit Cost
Independent read/write scaling Two models to maintain
Optimized read performance Eventual consistency
Flexible query patterns Complexity
Different storage technologies More infrastructure

Deep Dive: CQRS for implementation details


Event Sourcing

Store events as the source of truth. Current state is derived by replaying events.

graph LR
    subgraph Event Store
        E1[OrderCreated]
        E2[ItemAdded]
        E3[ItemAdded]
        E4[OrderShipped]
    end
    ES[Event Stream] --> R[Replay]
    R --> S[Current State]

Event Sourcing Concepts

Concept Description
Event Store Append-only log of events
Current State Derived by replaying events
Snapshot Periodic state capture to speed replay
Projection Read model built from events

When to Use Event Sourcing

Good Fit Bad Fit
Audit trail required Simple CRUD
Time travel debugging No replay needs
Complex state machines Straightforward state
Regulatory compliance Team unfamiliar with pattern

Deep Dive: Event Sourcing for implementation details


Data Replication via Events

Services maintain local copies of data they need, synced via events.

sequenceDiagram
    participant CS as Customer Service
    participant K as Kafka
    participant OS as Order Service
    participant ODB as Order DB
    CS->>K: CustomerUpdated event
    K->>OS: Consume event
    OS->>ODB: Update local customer cache
    Note over OS,ODB: Order Service has local copy of customer data

Why Replicate Data?

Reason Benefit
Performance Avoid cross-service calls
Availability Function when other service is down
Query optimization Join local data

Replication Trade-offs

Benefit Cost
Fast local queries Data may be stale
Decoupled availability Storage duplication
Simpler queries Event handling complexity

What to Replicate

Replicate Don't Replicate
Reference data (product name, customer name) Transactional data
Data needed for local queries Frequently changing data
Immutable or slowly changing data Sensitive data (minimize copies)

Saga Pattern for Distributed Transactions

Coordinate cross-service operations with local transactions and compensating actions.

Step Forward Action Compensating Action
1 Reserve inventory Release reservation
2 Charge payment Issue refund
3 Create shipment Cancel shipment

Deep Dive: Saga Pattern for choreography vs orchestration


Outbox Pattern

Guarantee atomic database write + event publish using a single transaction.

sequenceDiagram
    participant App
    participant DB
    participant Poller
    participant Kafka
    App->>DB: BEGIN TX
    App->>DB: Insert order
    App->>DB: Insert outbox record
    App->>DB: COMMIT
    Poller->>DB: Poll outbox table
    Poller->>Kafka: Publish event
    Poller->>DB: Mark as published

Deep Dive: Outbox Pattern for implementation


Pattern Selection Guide

Need Pattern
Service owns its data Database per Service
Query across services API Composition
High read/write ratio CQRS
Audit trail, time travel Event Sourcing
Avoid cross-service calls Data Replication
Cross-service transactions Saga
Reliable event publish Outbox

How do you handle joins across microservices?

Three options: (1) API Composition — call multiple services and join in application. (2) CQRS — build denormalized read models from events. (3) Data Replication — store copies of needed data locally. API Composition is simplest; CQRS is most flexible; Replication is fastest but requires event handling.

When is data duplication across services acceptable?

When the duplicated data is (1) reference data (product names, customer names), (2) slowly changing, (3) needed for query performance, (4) synced via events. Avoid duplicating frequently changing transactional data. Accept eventual consistency. The source of truth publishes events; consumers maintain local copies.

What's the difference between CQRS and Event Sourcing?

CQRS separates read and write models — they can use different databases and schemas. Event Sourcing stores events as the source of truth — state is derived by replay. They're often used together (events feed read model projections) but are independent patterns. You can use CQRS without Event Sourcing (events just sync the read model).