Weekly Progress Checklist
Use this checklist to track your weekly progress through the implementation roadmap.
Week 1: Foundation Setup ✅
Infrastructure
- [ ] Docker Desktop installed and running
- [ ] Java 17 (Temurin) installed
- [ ] Maven installed and configured
- [ ] Local infra running (Postgres, Redis, Kafka)
Product Service
- [ ] Product entity created
- [ ] ProductRepository implemented
- [ ] REST API endpoints working (CRUD)
- [ ] Validation added
- [ ] Unit tests passing
- [ ] Integration tests with Testcontainers
- [ ] OpenAPI/Swagger documentation accessible
Validation Checks
- [ ]
docker psshows 3 containers - [ ]
mvn testpasses - [ ] Can create/read/update/delete products via Postman
- [ ] Swagger UI accessible at localhost:8080/swagger-ui.html
- [ ] Test coverage > 80%
Learning Verification
- [ ] Understand Spring Boot basics
- [ ] Can explain JPA entities and repositories
- [ ] Know how to write REST controllers
- [ ] Comfortable with Docker Compose
- [ ] Can write unit and integration tests
Week 2: Cart Service + Basic Observability ✅
Cart Service
- [ ] Cart and CartItem entities created
- [ ] Cart CRUD operations working
- [ ] Redis caching implemented
- [ ] Cart-to-Product integration working
Observability
- [ ] Actuator endpoints enabled
- [ ] Health checks working
- [ ] Structured JSON logging configured
- [ ] Prometheus metrics exposed
- [ ] Custom business metrics added
Integration
- [ ] Cart service calls Product service
- [ ] Retry logic implemented (Spring Retry)
- [ ] Error handling graceful
Validation Checks
- [ ] Can add product to cart
- [ ] Cache hit/miss logged correctly
- [ ]
/actuator/healthreturns 200 - [ ]
/actuator/prometheusshows metrics - [ ] Retry happens on failure
Learning Verification
- [ ] Understand service-to-service communication
- [ ] Know Redis caching strategies
- [ ] Can configure Actuator
- [ ] Understand structured logging
- [ ] Can create custom metrics
Week 3: Kafka Integration ✅
Kafka Setup
- [ ] Kafka (Redpanda) verified running
- [ ] Topics created
- [ ] Can produce/consume manually
Event Publishing
- [ ] ProductCreatedEvent defined
- [ ] Event published on product creation
- [ ] Event visible in Kafka topic
Event Consumption
- [ ] Inventory service created
- [ ] Consumes ProductCreatedEvent
- [ ] Creates inventory record
Event Patterns
- [ ] Event schemas defined
- [ ] Schema validation implemented
- [ ] Idempotency keys added
- [ ] Duplicate detection working
Validation Checks
- [ ] Event appears in Kafka after product creation
- [ ] Inventory record created on event
- [ ] Duplicate events don't create duplicates
- [ ] Schema violations fail
Learning Verification
- [ ] Understand Kafka concepts (topics, partitions, offsets)
- [ ] Know event-driven architecture
- [ ] Can implement idempotency
- [ ] Understand schema evolution
Week 4: Order Service + Saga ✅
Order Service
- [ ] Order and OrderItem entities
- [ ] State machine implemented (PENDING → CONFIRMED → PAID → FULFILLED)
- [ ] State transitions validated
Saga Pattern
- [ ] Inventory reservation working
- [ ] Order creation saga functional
- [ ] OrderCreatedEvent published
Compensation
- [ ] Rollback on inventory failure
- [ ] Compensation events published
- [ ] Saga timeout configured
- [ ] Resource cleanup working
Validation Checks
- [ ] Happy path: order created successfully
- [ ] Failure path: inventory released on failure
- [ ] Saga times out after 30 seconds
- [ ] End-to-end order flow works
Learning Verification
- [ ] Understand saga pattern
- [ ] Know compensation logic
- [ ] Can implement state machines
- [ ] Understand eventual consistency
Week 5: OpenTelemetry & Tracing ✅
Distributed Tracing
- [ ] OpenTelemetry agent added
- [ ] OTLP exporter configured
- [ ] Jaeger deployed locally
- [ ] Traces visible in Jaeger
Custom Spans
- [ ] Business operation spans added
- [ ] Span attributes configured
- [ ] Spans linked across services
Prometheus + Grafana
- [ ] Observability stack deployed
- [ ] Prometheus scraping metrics
- [ ] Grafana accessible
- [ ] First dashboard created
Alerting
- [ ] Alert rules applied
- [ ] Alertmanager configured
- [ ] Test alert fired
Validation Checks
- [ ] Traces visible in Jaeger UI
- [ ] Spans have business context
- [ ] Grafana shows live metrics
- [ ] Alert fires on simulated error
Learning Verification
- [ ] Understand distributed tracing
- [ ] Can create custom spans
- [ ] Know Prometheus/Grafana basics
- [ ] Can configure alerts
Week 6: Advanced Observability ✅
SLO Monitoring
- [ ] SLOs defined in yaml
- [ ] SLO dashboard created
- [ ] Error budget tracked
Latency Monitoring
- [ ] Histogram buckets configured
- [ ] Latency distribution graphs
- [ ] p95/p99 alerts set
Log Aggregation (Optional)
- [ ] Loki installed
- [ ] Logs searchable
Correlation
- [ ] Trace IDs in logs
- [ ] Logs linked to traces
- [ ] Exemplars in metrics
Validation Checks
- [ ] SLO compliance visible
- [ ] Latency percentiles charted
- [ ] Can jump from metric → trace → logs
Learning Verification
- [ ] Understand SLI/SLO/SLA
- [ ] Know percentiles and histograms
- [ ] Can correlate signals
Week 7: Resilience Patterns ✅
Resilience4j
- [ ] Circuit breaker configured
- [ ] Circuit opens on failures
- [ ] Retry with exponential backoff
- [ ] Jitter added
Bulkhead & Rate Limiting
- [ ] Thread pool isolation
- [ ] Rate limiter on APIs
- [ ] 429 responses returned
Fallbacks
- [ ] Fallback to cached data
- [ ] Default responses
- [ ] Graceful degradation
Validation Checks
- [ ] Circuit breaker opens/closes correctly
- [ ] Retries visible in logs
- [ ] Rate limiting throttles requests
- [ ] Fallbacks work
Learning Verification
- [ ] Understand circuit breaker pattern
- [ ] Know retry strategies
- [ ] Can implement bulkheads
- [ ] Understand graceful degradation
Week 8: Chaos Engineering ✅
Chaos Mesh
- [ ] Chaos Mesh installed
- [ ] Dashboard accessible
Experiments
- [ ] Pod kill experiment run
- [ ] Network latency injected
- [ ] Packet loss tested
Game Day
- [ ] Runbook completed
- [ ] Roles assigned
- [ ] Game day scheduled
Validation Checks
- [ ] Service recovers from pod kill
- [ ] Circuit breaker activates on latency
- [ ] No cascading failures
- [ ] Metrics show impact
Learning Verification
- [ ] Understand chaos engineering principles
- [ ] Can run chaos experiments
- [ ] Know how to document findings
- [ ] Can prepare runbooks
Week 9: Load Testing Setup ✅
Gatling
- [ ] Gatling project built
- [ ] Order flow simulation created
- [ ] Load profile configured
Baseline
- [ ] Baseline tests run
- [ ] Latencies recorded
- [ ] Bottlenecks identified
k6 CI
- [ ] k6 smoke test created
- [ ] Added to GitHub Actions
- [ ] Performance gates set
Validation Checks
- [ ] Gatling simulation runs successfully
- [ ] Baseline in sli-targets.yaml
- [ ] k6 runs in CI
- [ ] Bottlenecks documented
Learning Verification
- [ ] Understand load testing concepts
- [ ] Can model realistic user behavior
- [ ] Know how to analyze results
- [ ] Can identify bottlenecks
Week 10: Performance Optimization ✅
Database
- [ ] Indexes added
- [ ] Connection pool tuned
- [ ] Query times improved 50%+
Caching
- [ ] Cache warming implemented
- [ ] Cache hit ratio > 80%
- [ ] HTTP caching (ETags)
Validation
- [ ] Re-run load tests
- [ ] 30%+ improvement in p95
- [ ] Improvements documented
Validation Checks
- [ ] No connection timeouts
- [ ] Cache hit ratio metric
- [ ] 304 responses for unchanged resources
- [ ] Performance improved
Learning Verification
- [ ] Understand database optimization
- [ ] Know caching strategies
- [ ] Can measure performance improvements
- [ ] Understand HTTP caching
Week 11: Security Hardening ✅
Dependency Scanning
- [ ] OWASP Dependency Check run
- [ ] Vulnerable deps updated
- [ ] No CRITICAL CVEs
Container Security
- [ ] Trivy scan clean
- [ ] Distroless images used
Authentication
- [ ] JWT generation working
- [ ] JWT validation filter
- [ ] Endpoints secured
OWASP ZAP
- [ ] ZAP baseline scan run
- [ ] Issues fixed
- [ ] No HIGH findings
Secrets
- [ ] Secrets externalized
- [ ] No secrets in code
Validation Checks
- [ ] Protected endpoints require auth
- [ ] Clean security scans
- [ ] Secrets in env vars
Learning Verification
- [ ] Understand common vulnerabilities
- [ ] Know JWT authentication
- [ ] Can use security scanning tools
- [ ] Understand secrets management
Week 12: CI/CD Pipeline ✅
Build Pipeline
- [ ] GitHub Actions configured
- [ ] Build runs on push
- [ ] Dependencies cached
Quality Gates
- [ ] Test coverage gate (80%)
- [ ] Security scan gate
- [ ] Build fails on issues
Contract Testing
- [ ] Pact consumer tests
- [ ] Contracts published
- [ ] Contract stage in CI
Docker
- [ ] Images built in CI
- [ ] Tagged with SHA
- [ ] Pushed to registry
Validation Checks
- [ ] Build passes on push
- [ ] Poor quality blocked
- [ ] Contract breaking changes detected
- [ ] Images available in registry
Learning Verification
- [ ] Understand CI/CD concepts
- [ ] Know quality gates
- [ ] Can write contract tests
- [ ] Understand image tagging
Week 13: gRPC Implementation ✅
gRPC Service
- [ ] Proto file defined
- [ ] Java code generated
- [ ] gRPC service implemented
- [ ] gRPC server running
gRPC Client
- [ ] Client stub created
- [ ] Order service calls inventory
- [ ] gRPC errors handled
Observability
- [ ] gRPC interceptors added
- [ ] gRPC metrics exposed
- [ ] Traces in Jaeger
Resilience
- [ ] Retry policy configured
- [ ] Deadlines set
- [ ] Failure scenarios tested
Validation Checks
- [ ] gRPC calls work via grpcurl
- [ ] Order → Inventory via gRPC
- [ ] Traces show gRPC spans
- [ ] Retries work
Learning Verification
- [ ] Understand Protocol Buffers
- [ ] Know gRPC vs REST tradeoffs
- [ ] Can implement gRPC services
- [ ] Understand gRPC error handling
Week 14: WebSocket & SSE ✅
WebSocket
- [ ] Notification service created
- [ ] WebSocket endpoint configured
- [ ] Order status notifications
- [ ] JWT authentication
SSE
- [ ] SSE endpoint for flash sales
- [ ] Events broadcast
- [ ] Reconnection handled
Load Testing
- [ ] Gatling WebSocket scenario
- [ ] 1000+ concurrent connections
- [ ] Resource usage monitored
Validation Checks
- [ ] WebSocket connection works
- [ ] Real-time updates received
- [ ] SSE events delivered
- [ ] Handle 1000+ connections
Learning Verification
- [ ] Understand WebSocket protocol
- [ ] Know SSE use cases
- [ ] Can secure WebSocket
- [ ] Understand real-time scaling
Week 15: CDC & Projections ✅
Debezium
- [ ] Debezium connector deployed
- [ ] CDC for orders table
- [ ] CDC events in Kafka
Projections
- [ ] orders_by_customer projection
- [ ] Built from CDC events
- [ ] Stored in database
Kafka Streams
- [ ] Streams app created
- [ ] Order totals aggregated
- [ ] Windowing configured
Replay
- [ ] Replay service skeleton
- [ ] Events replayed
- [ ] Projection validated
Validation Checks
- [ ] DB changes in Kafka
- [ ] Projection stays in sync
- [ ] Aggregations correct
- [ ] Replay rebuilds projection
Learning Verification
- [ ] Understand CDC concepts
- [ ] Know projection patterns
- [ ] Can use Kafka Streams
- [ ] Understand event replay
Week 16: Production Readiness ✅
Documentation
- [ ] READMEs updated
- [ ] ADRs documented
- [ ] Runbooks created
Kubernetes
- [ ] Helm charts created
- [ ] Resource limits set
- [ ] Health probes added
- [ ] Services deployed to K8s
Game Day
- [ ] Multi-service chaos
- [ ] Team participated
- [ ] Incident documented
Retrospective
- [ ] Retro completed
- [ ] Learnings documented
- [ ] Phase 2 planned
Validation Checks
- [ ] Docs peer-reviewed
- [ ] Services run in K8s
- [ ] Chaos handled gracefully
- [ ] Retro notes captured
Learning Verification
- [ ] Can document architecture
- [ ] Understand Kubernetes
- [ ] Know incident management
- [ ] Can reflect and improve
Progress Tracking
| Week | Status | Completion % | Notes |
|---|---|---|---|
| 1 | 🔄 | 0% | |
| 2 | ⏳ | 0% | |
| 3 | ⏳ | 0% | |
| 4 | ⏳ | 0% | |
| 5 | ⏳ | 0% | |
| 6 | ⏳ | 0% | |
| 7 | ⏳ | 0% | |
| 8 | ⏳ | 0% | |
| 9 | ⏳ | 0% | |
| 10 | ⏳ | 0% | |
| 11 | ⏳ | 0% | |
| 12 | ⏳ | 0% | |
| 13 | ⏳ | 0% | |
| 14 | ⏳ | 0% | |
| 15 | ⏳ | 0% | |
| 16 | ⏳ | 0% |
Legend: ✅ Complete | 🔄 In Progress | ⏳ Not Started
Tips for Success
- Check off items as you complete them - Immediate satisfaction!
- Don't skip validation checks - They ensure quality
- Learning verification is crucial - Understanding > completion
- Document blockers in notes - Help for troubleshooting
- Celebrate weekly wins - Track your progress!
When Blocked
- [ ] Checked existing documentation
- [ ] Searched official docs
- [ ] Reviewed similar code
- [ ] Asked AI assistant with context
- [ ] Took break and came back
- [ ] Documented issue for help
Start Date: _
Target Completion: _
Actual Completion: ____