Skip to content

18 · Change Data Capture (CDC) — Stream Database Changes Downstream

Distributed Transactions · Topic 18 of 20


What is CDC?

Change Data Capture tracks row-level changes (INSERT, UPDATE, DELETE) in a database and streams them to downstream consumers in real time.

CDC turns the database into an event stream.


Use Cases

Use Case How CDC Helps
Search index sync Sync Elasticsearch on every DB write
Cache invalidation Invalidate Redis entries when data changes
Audit logging Immutable log of all data changes
Data replication Replicate to another DB or data warehouse
Event-driven pipelines Trigger microservices on data changes

How CDC Works

sequenceDiagram
    App->>Database: INSERT / UPDATE / DELETE
    Database->>WAL: Append change record
    CDC Tool->>WAL: Read changes (tail the log)
    CDC Tool->>Kafka: Publish change events
    Consumers->>Kafka: Read and process events

CDC reads from the WAL (or equivalent log) — not via polling queries.


CDC Tools

Tool Source DBs Target
Debezium PostgreSQL, MySQL, MongoDB, SQL Server Kafka, Kinesis
AWS DMS Most relational DBs S3, Kinesis, RDS
Striim Multi-source Kafka, GCS, BigQuery
Datastream (GCP) PostgreSQL, MySQL, Oracle BigQuery, GCS, Spanner

Cloud Implementations

  • Enable logical replication: wal_level = logical
  • Create replication slot: pg_create_logical_replication_slot('debezium', 'pgoutput')
  • Debezium reads from the slot and publishes to Kafka
  • DynamoDB Streams: captures item-level changes (OLD_IMAGE, NEW_IMAGE, or BOTH)
  • Lambda triggers or Kinesis Data Streams consume the stream
  • Retention: 24 hours
  • Spanner Change Streams: monitor changes to tables/columns
  • Consumed via Dataflow pipelines
    CREATE CHANGE STREAM MyStream FOR ALL;
    
  • Change Streams: resume token-based streaming from the oplog
    const changeStream = db.collection('orders').watch();
    changeStream.on('change', (change) => { console.log(change); });
    
  • No native CDC targeting Kafka; use Debezium Cassandra connector
  • Reads from commit log; available in Cassandra 3.0+

At-Least-Once Delivery

CDC systems typically guarantee at-least-once delivery. Consumers must be idempotent — processing the same event twice must have the same effect as processing it once.