ADR-015: Load Testing and Performance Engineering Strategy
Status
Accepted
Context
BitVelocity requires comprehensive performance testing to: - Establish performance baselines for all services - Detect performance regressions early in CI/CD - Practice capacity planning and scalability analysis - Validate SLI/SLO targets - Learn hands-on performance engineering patterns
The system spans multiple protocols (REST, gRPC, WebSocket, Kafka) requiring varied testing approaches.
Decision
Testing Framework Stack
- Gatling (Java): Primary load testing framework for complex scenarios
- k6: Lightweight performance testing for CI/CD integration
- JMeter: Optional for specific protocol testing (SOAP, MQTT)
Test Categories
1. Smoke Tests
- Purpose: Quick validation, regression detection
- Duration: < 1 minute
- Load: 10 VUs
- Frequency: Every PR
- Tool: k6
- Location:
.github/workflows/performance-smoke.yml
2. Load Tests
- Purpose: Validate performance under expected load
- Duration: 5-15 minutes
- Load: Ramp to 100-500 users
- Frequency: Nightly
- Tool: Gatling
3. Stress Tests
- Purpose: Find breaking points
- Duration: 15-30 minutes
- Load: Incremental until failure
- Frequency: Weekly
- Tool: Gatling
4. Spike Tests
- Purpose: Test behavior under sudden load increases
- Duration: 5-10 minutes
- Load: Immediate spike to 200+ users
- Frequency: Weekly
- Tool: Gatling
5. Endurance Tests
- Purpose: Detect memory leaks, resource exhaustion
- Duration: 2-4 hours
- Load: Sustained moderate load
- Frequency: Monthly
Performance Targets (Initial Baselines)
| Service/Flow | p50 | p95 | p99 | Throughput |
|---|---|---|---|---|
| POST /orders | 50ms | 200ms | 500ms | 100 req/s |
| GET /products/{id} | 15ms | 50ms | 100ms | 500 req/s |
| gRPC ReserveStock | 30ms | 100ms | 200ms | 200 req/s |
| WebSocket fan-out | 50ms | 150ms | 300ms | 1000 msg/s |
CI/CD Integration
# Performance gates in pipeline
performance_smoke:
thresholds:
- p95 < 200ms
- error_rate < 1%
- degradation < 10% (compared to baseline)
Metrics Collection
All tests must collect: - Response time distributions (p50, p90, p95, p99) - Error rates by type - Throughput (requests/second) - Resource utilization (CPU, memory, network) - JVM metrics (if applicable)
Test Data Management
- Synthetic data generators per domain
- Repeatable test data sets for deterministic results
- PII-free data for compliance
Result Storage
- JSON results stored in git (trending analysis)
- Grafana dashboards for visualization
- Automated comparison with baselines
Consequences
Positive
- Early detection of performance regressions
- Hands-on learning of performance engineering
- Data-driven capacity planning
- Confidence in scalability
- Better understanding of system behavior under load
Negative
- Additional CI/CD time (mitigated by parallel execution)
- Infrastructure costs for load testing environments
- Requires discipline to maintain test scenarios
- Initial learning curve for Gatling/k6
Trade-offs
- Chose Gatling (Java) over Scala DSL for consistency with codebase
- k6 for CI due to speed, Gatling for comprehensive scenarios
- Accepted longer test times for production-like load profiles
Implementation Plan
- Phase 1 (Week 1-2): Setup infrastructure
- Create
bv-performance-testingmodule - Set up Gatling project structure
-
Create baseline k6 smoke tests
-
Phase 2 (Week 3): Implement core scenarios
- Order creation flow (Gatling)
- Product browsing (k6)
-
Inventory operations (Gatling)
-
Phase 3 (Week 4): CI/CD integration
- Add performance-smoke workflow
- Configure thresholds and gates
-
Set up result reporting
-
Phase 4 (Ongoing): Expand coverage
- Add WebSocket/SSE scenarios
- Implement stress and endurance tests
- Build Grafana dashboards
References
bv-performance-testing/README.mdbv-performance-testing/performance-baselines/sli-targets.yaml.github/workflows/performance-smoke.yml- Gatling Documentation
- k6 Documentation