18 · Change Data Capture (CDC) — Stream Database Changes Downstream
Distributed Transactions · Topic 18 of 20
What is CDC?
Change Data Capture tracks row-level changes (INSERT, UPDATE, DELETE) in a database and streams them to downstream consumers in real time.
CDC turns the database into an event stream.
Use Cases
| Use Case | How CDC Helps |
|---|---|
| Search index sync | Sync Elasticsearch on every DB write |
| Cache invalidation | Invalidate Redis entries when data changes |
| Audit logging | Immutable log of all data changes |
| Data replication | Replicate to another DB or data warehouse |
| Event-driven pipelines | Trigger microservices on data changes |
How CDC Works
sequenceDiagram
App->>Database: INSERT / UPDATE / DELETE
Database->>WAL: Append change record
CDC Tool->>WAL: Read changes (tail the log)
CDC Tool->>Kafka: Publish change events
Consumers->>Kafka: Read and process events
CDC reads from the WAL (or equivalent log) — not via polling queries.
CDC Tools
| Tool | Source DBs | Target |
|---|---|---|
| Debezium | PostgreSQL, MySQL, MongoDB, SQL Server | Kafka, Kinesis |
| AWS DMS | Most relational DBs | S3, Kinesis, RDS |
| Striim | Multi-source | Kafka, GCS, BigQuery |
| Datastream (GCP) | PostgreSQL, MySQL, Oracle | BigQuery, GCS, Spanner |
Cloud Implementations
- Enable logical replication:
wal_level = logical - Create replication slot:
pg_create_logical_replication_slot('debezium', 'pgoutput') - Debezium reads from the slot and publishes to Kafka
- DynamoDB Streams: captures item-level changes (OLD_IMAGE, NEW_IMAGE, or BOTH)
- Lambda triggers or Kinesis Data Streams consume the stream
- Retention: 24 hours
- Spanner Change Streams: monitor changes to tables/columns
- Consumed via Dataflow pipelines
- Change Streams: resume token-based streaming from the oplog
- No native CDC targeting Kafka; use Debezium Cassandra connector
- Reads from commit log; available in Cassandra 3.0+
At-Least-Once Delivery
CDC systems typically guarantee at-least-once delivery. Consumers must be idempotent — processing the same event twice must have the same effect as processing it once.