Cross-Cutting – Replay & Disaster Recovery Drills
Last Updated: December 29, 2025
Purpose: Defines replay mechanisms for rebuilding projections and disaster recovery procedures for system resilience.
Related Documentation:
Module References:
- Replay Service:
bv-eCommerce-core/replay-service/ - Chaos Experiments:
bv-chaos-experiments/
1. Replay Goals
Rebuild derived projections and recover from data corruption or missed events.
2. Replay Data Sources
- Parquet archived domain events
- CDC parquet (Debezium output)
- Snapshot manifests (optional)
3. Replay Workflow
- Identify target projection & timespan
- Acquire input set (list parquet segments)
- Stream sorted by timestamp/LSN
- Apply transformation (same logic as live stream code)
- Validation: sample checksums, row counts
- Emit system.replay.completed.v1 event
4. CLI (Conceptual)
replay \
--projection orders_by_customer \
--from 2025-02-01T00:00Z \
--to 2025-02-02T00:00Z \
--events s3://archive/events/ecommerce.order/2025/02/01 \
--cdc s3://archive/cdc/orders/2025/02/01 \
--dry-run
5. DR Drill Types
| Drill | Scenario | Goal |
|---|---|---|
| Failover DB | East Postgres down | Promote replica |
| Kafka Partition Loss | Topic partial outage | Replay missing partition events |
| Cache Flush | Redis cleared | Rehydrate via events |
| Cross-Region Latency Spike | Simulate 500ms RTT | Validate fallbacks |
| Replay Integrity | Random projection deletion | Full rebuild parity |
6. Metrics
- replay_duration_seconds
- replay_events_processed_total
- replay_validation_failures_total
- dr_rto_seconds
- dr_rpo_seconds
7. DR Runbook (High-Level)
- Detect incident (alerts)
- Declare severity
- Promote replica (script)
- Update endpoints / DNS
- Replay gap (if RPO > 0)
- Verify service health & user flows
- Post-mortem record & improvements
8. Exit Criteria
- At least one successful replay of each major projection (orders, feed)
- DR drill executed with RTO < 10m and RPO < 2m (learning targets)
- Replay job idempotent & documented
Related Documentation
Architecture References
Module READMEs
bv-eCommerce-core/replay-service/README.mdbv-chaos-experiments/README.md- DR drill procedures
ADRs
Document Status: Active Reference ✅
Last Review: December 29, 2025