Skip to content

Phase 5 — Observability Baseline

🏷️ Domain Focus

Primary: 🔧 Infrastructure - Observability and monitoring
Cross-Domain: 🏪 eCommerce, 🔒 Security - Service instrumentation
Protocol Focus: 📡 gRPC bidirectional streaming

Epic Integration: Observability Foundation (16 Story Points)

Epic Objective: Implement comprehensive observability with metrics collection, structured logging, and monitoring
Success Criteria: Prometheus metrics collection, structured JSON logging with correlation IDs, health check endpoints, Grafana dashboards, basic alerting rules, log aggregation and searching

Objectives

  • Implement OpenTelemetry instrumentation and monitoring infrastructure
  • Build comprehensive observability for distributed microservices
  • Learn gRPC patterns through service-to-service communication

Deliverables

  • OpenTelemetry configuration and instrumentation
  • Grafana dashboards and Prometheus metrics
  • gRPC service contracts and streaming examples

Tasks (acceptance)

1) Instrumentation with gRPC Learning (Story Points: 7) - [ ] Trace IDs flow across calls; custom metrics exposed - [ ] gRPC Protocol Lab: Implement bidirectional streaming between services - Design .proto files with service definitions - Implement client/server streaming for real-time data feeds - Add gRPC interceptors for tracing and metrics - Compare performance vs REST endpoints - [ ] Service Mesh Integration: Configure gRPC load balancing and service discovery - [ ] Implementation Tasks: - Implement Prometheus metrics with Micrometer (4 pts) - Configure structured logging with Logback (3 pts) - [ ] Modules: auth, product (as exemplars)

2) Dashboards & alerts with Protocol Observability (Story Points: 9) - [ ] Grafana dashboards (latency, error rate, throughput) - [ ] Protocol-specific metrics: Track gRPC vs REST vs async message performance - [ ] Distributed tracing: Visualize request flow across protocol boundaries - [ ] Basic alert rules committed with protocol-aware thresholds - [ ] Implementation Tasks: - Add health check endpoints (3 pts) - Create Grafana dashboards (3 pts) - Setup alerting rules and notifications (3 pts) - [ ] Files: grafana/ dashboards JSON; docs/01-ARCHITECTURE/CROSS_OBSERVABILITY_AND_TESTING.md - [ ] Acceptance: Dashboards for latency/error rate/throughput, basic alerts on error spikes

Dependencies

Learning & References - Reference Topics (Protocols & Concurrency)

Next Phase: Phase 6