Skip to content

06 · Read Replicas — Scale Read-Heavy Workloads Horizontally

Scaling Reads · Topic 6 of 7


The Problem

A single primary database becomes a bottleneck when read QPS grows. Vertical scaling (bigger machines) has limits and is expensive.


Read Replicas

A read replica is a copy of the primary database that serves read-only traffic. All writes still go to the primary.

graph LR
    App -->|Writes| Primary[(Primary)]
    App -->|Reads| R1[(Read Replica 1)]
    App -->|Reads| R2[(Read Replica 2)]
    Primary -->|Async replication| R1
    Primary -->|Async replication| R2

When to Use

  • Reports and analytics queries that are expensive but don't need fresh data
  • Search / autocomplete hitting the same tables as transactional writes
  • Multi-region reads — serve users from the nearest replica

When NOT to Use

  • When the application requires read-your-own-writes consistency
  • Real-time dashboards that cannot tolerate any lag
  • Low-latency OLTP reads — a well-indexed primary handles this better

Cloud Implementations

  • Up to 15 read replicas (AWS RDS)
  • Streaming WAL-based replication
  • Route via separate read endpoint or connection pooler (PgBouncer)
  • Read-only replicas placed in any region
  • Strong reads vs. stale reads (bounded staleness for lower latency)
  • max_staleness hint: READS WITH max_staleness = INTERVAL '15' SECOND
  • Global Tables = multi-region replicas
  • ConsistentRead: false for eventually consistent (cheaper) reads from any replica
  • Every node can serve reads; RF determines how many copies exist
  • ConsistencyLevel.ONE reads from nearest replica
  • readPreference: secondary — route reads to secondaries
  • maxStalenessSeconds to bound acceptable lag

Connection Routing Patterns

Pattern How
Application-level App chooses primary or replica connection string per operation
Proxy-level PgBouncer, ProxySQL, RDS Proxy route by query type
ORM-level Django DATABASES routing, Spring @Transactional(readOnly=true)