Phase 5 — Observability Baseline
🏷️ Domain Focus
Primary: 🔧 Infrastructure - Observability and monitoring
Cross-Domain: 🏪 eCommerce, 🔒 Security - Service instrumentation
Protocol Focus: 📡 gRPC bidirectional streaming
Epic Integration: Observability Foundation (16 Story Points)
Epic Objective: Implement comprehensive observability with metrics collection, structured logging, and monitoring
Success Criteria: Prometheus metrics collection, structured JSON logging with correlation IDs, health check endpoints, Grafana dashboards, basic alerting rules, log aggregation and searching
Objectives
- Implement OpenTelemetry instrumentation and monitoring infrastructure
- Build comprehensive observability for distributed microservices
- Learn gRPC patterns through service-to-service communication
Deliverables
- OpenTelemetry configuration and instrumentation
- Grafana dashboards and Prometheus metrics
- gRPC service contracts and streaming examples
Tasks (acceptance)
1) Instrumentation with gRPC Learning (Story Points: 7)
- [ ] Trace IDs flow across calls; custom metrics exposed
- [ ] gRPC Protocol Lab: Implement bidirectional streaming between services
- Design .proto files with service definitions
- Implement client/server streaming for real-time data feeds
- Add gRPC interceptors for tracing and metrics
- Compare performance vs REST endpoints
- [ ] Service Mesh Integration: Configure gRPC load balancing and service discovery
- [ ] Implementation Tasks:
- Implement Prometheus metrics with Micrometer (4 pts)
- Configure structured logging with Logback (3 pts)
- [ ] Modules: auth, product (as exemplars)
2) Dashboards & alerts with Protocol Observability (Story Points: 9)
- [ ] Grafana dashboards (latency, error rate, throughput)
- [ ] Protocol-specific metrics: Track gRPC vs REST vs async message performance
- [ ] Distributed tracing: Visualize request flow across protocol boundaries
- [ ] Basic alert rules committed with protocol-aware thresholds
- [ ] Implementation Tasks:
- Add health check endpoints (3 pts)
- Create Grafana dashboards (3 pts)
- Setup alerting rules and notifications (3 pts)
- [ ] Files: grafana/ dashboards JSON; docs/01-ARCHITECTURE/CROSS_OBSERVABILITY_AND_TESTING.md
- [ ] Acceptance: Dashboards for latency/error rate/throughput, basic alerts on error spikes
Dependencies
Learning & References - Reference Topics (Protocols & Concurrency)
Next Phase: Phase 6