Common Pitfalls & Best Practices
Pitfalls: What NOT to Do
β Pitfall 1: Not Using Realistic Think-Time
Problem: Firing requests as fast as possible without pauses between them.
// BAD - No think time
scenario("Bad User Journey")
.exec(http("Step 1").get("/api/products"))
.exec(http("Step 2").get("/api/product/123"))
.exec(http("Step 3").post("/api/cart"))
.exec(http("Step 4").post("/api/checkout"))
// All executed instantly, one after another!
Why it's bad: - Real users don't fire requests instantly - Real user reads product description (10-30 sec) - Real user reviews options (5-15 sec) - Real user decides whether to buy (30-60 sec) - Your test doesn't reflect reality - You're testing code path, not user behavior
Real flow:
GET /api/products β pause 10-30s β GET /api/product/123
β pause 15s β POST /api/cart β pause 20s β POST /api/checkout
Simulated flow (bad):
GET /api/products β GET /api/product/123 β POST /api/cart β POST /api/checkout
(All in <100ms, unrealistic spike)
Fix:
// GOOD - Realistic think-time
scenario("Realistic User Journey")
.exec(http("Browse Products").get("/api/products"))
.pause(10, 30) // 10-30 seconds think time
.exec(http("View Product Detail").get("/api/product/#{productId}"))
.pause(5, 15)
.exec(http("Add to Cart").post("/api/cart"))
.pause(20, 40)
.exec(http("Checkout").post("/api/checkout"))
Lesson: Your test should simulate real user behavior, not optimal network conditions.
β Pitfall 2: Load Testing from Single Machine (Becoming Bottleneck)
Problem: Running 10,000 simulated users from your laptop.
ββ Your laptop (1 machine)
β ββ 10,000 simulated users
β ββ CPU: 100% (maxed out)
β ββ Network: Saturated
β ββ Garbage collection pauses
β
ββ API server being tested
ββ CPU: 20%
ββ "The API is fast!" (not true, your load generator is the bottleneck)
Why it's bad: - Single machine has CPU/memory/network limits - Gatling JVM becomes bottleneck, not your API - Results are meaningless (measuring load generator, not API) - Can't simulate realistic request patterns
Fix:
Option 1: Gatling Enterprise (cloud)
ββ Distribute load across multiple cloud instances
Option 2: Self-hosted distribution
ββ Run Gatling from multiple machines
ββ Coordinate via shared report
ββ Aggregate results
Option 3: Start small, scale gradually
ββ 100 users on 1 machine (safe)
ββ 1000 users: Still 1 machine (getting risky)
ββ 10,000 users: Use 10 machines (1000 users each)
ββ Rule: 1,000 users max per machine
Lesson: As load increases, distribute it across machines.
β Pitfall 3: Testing Production Without Permission
Problem: Running load test against production database/API.
Why it's bad: - Real users experience your test (slow site for actual customers) - Logs polluted with fake data - Alerts triggered (fire department called for practice drill) - Load test metrics mixed with real user metrics - Compliance violation (using production for testing)
Fix: Always test in staging environment.
Environment hierarchy:
Local Dev
ββ Fast iteration
ββ Unrealistic scale
ββ Solo testing
Staging/QA
ββ Production-like setup (data, scale, infrastructure)
ββ Safe to overload
ββ Load testing here β CORRECT
ββ Results reliable
Production
ββ Real users
ββ Real data
ββ Real revenue impact
ββ Load testing here β NEVER
Lesson: Load test only in staging.
β Pitfall 4: Ignoring the Warm-Up Phase
Problem: Starting test immediately without system warm-up.
Test results (no warm-up):
ββ First minute: p95 = 5000ms (JVM warming up, caches cold)
ββ Second minute: p95 = 2000ms (some caches warming)
ββ Third minute: p95 = 500ms (steady state)
ββ ...10 minutes...
ββ p95 = 500ms (stable)
β
ββ Average: 1000ms (high, but misleading!)
(Skewed by cold-start phase)
Correct approach (with warm-up):
ββ Warm-up (5 min) + skip in results
ββ Actual test (15 min) + measure
ββ p95 = 500ms (real, stable performance)
Why it's important: - JVM just-in-time compiler needs warm-up - Database query caches need population - Connection pools need initialization - First few minutes unrealistically slow
Fix:
setUp(
scenario.injectOpen(
constantUsersPerSec(100).during(300), // Warm-up: 5 min
constantUsersPerSec(100).during(900) // Actual test: 15 min
)
)
Then analyze only the second phase (after warm-up).
Lesson: Add 5-10 minute warm-up before measuring actual metrics.
β Pitfall 5: Only Tracking Mean Latency
Problem: Using mean latency to judge performance.
Test results:
ββ Mean: 200ms β Looks good!
ββ p99: 5000ms β But 1% of users wait 5 seconds!
Mean is misleading because:
ββ 100 requests: 99 at 100ms, 1 at 10,000ms
ββ Mean = (99Γ100 + 1Γ10,000) Γ· 100 = 199ms
(Mean hides the outlier!)
Fix: Always track percentiles.
.assertions(
global().responseTime().p50().lt(100), // Median
global().responseTime().p95().lt(500), // 95th percentile
global().responseTime().p99().lt(1000), // 99th percentile
global().responseTime().p99d9().lt(3000) // 99.9th percentile
)
Lesson: Percentiles tell the real story, mean is misleading.
β Pitfall 6: Not Testing Error Scenarios
Problem: Only testing the happy path (everything succeeds).
Why it's bad: - Real world has failures (database down, 5xx errors) - Cascading failures hidden - Circuit breaker behavior untested - Error handling code untested
Fix: Include error injection in tests.
scenario("Realistic Journey with Errors")
.exec(http("Get products").get("/api/products"))
.pause(5)
// 10% of users see slow/error response
.exec(http("Get details - sometimes slow")
.get("/api/product/#{productId}")
.simu lateServerErrors(0.1)) // 10% return 500
.pause(5)
.exec(http("Add to cart").post("/api/cart"))
// ... rest of journey
Lesson: Test realistic failure scenarios.
β Pitfall 7: One Test and Done
Problem: Running single load test and assuming you're done.
Bad approach:
Run constant load test once β metrics look OK β Deploy β
Missing:
ββ Stress test: Where's the breaking point?
ββ Soak test: Stable overnight or memory leak?
ββ Spike test: Can we survive viral moment?
ββ Re-test after optimization: Did fix work?
Fix: Run multiple test types.
Complete test suite:
1. Load Test (baseline)
ββ Does it meet SLAs?
2. Ramp Test (capacity)
ββ Where does it break?
3. Soak Test (stability)
ββ Stable overnight?
4. Spike Test (resilience)
ββ Survives viral moment?
Timeline: ~2-3 hours of testing (scheduled)
Lesson: One test reveals one thing. Use multiple tests for comprehensive understanding.
β Pitfall 8: Not Monitoring System During Test
Problem: Only looking at Gatling report, ignoring server metrics.
Scenario: Test shows latency increasing, but why?
β Bad: Only Gatling report
ββ Conclusion: "API is slow"
ββ Action: Scale API (wastes money)
β Good: Gatling + Datadog
ββ Gatling says: p95 = 1000ms (increasing)
ββ Datadog shows: Database CPU = 95% (the real bottleneck)
ββ Action: Optimize database (cheaper fix)
Fix: Monitor both client (Gatling) and server (Datadog).
During test, watch:
ββ Gatling console (RPS, latency, errors)
ββ Datadog APM (traces, slow operations)
ββ Datadog Metrics (CPU, memory, disk)
ββ Application logs (errors, exceptions)
ββ Database monitoring (slow queries, locks)
Tools:
ββ Gatling: Response times (client-side)
ββ Datadog APM: Where time is spent (server-side)
ββ Datadog Metrics: Resource utilization
ββ Logs: Error details
Lesson: Monitor both sides to find real bottlenecks.
Best Practices
β Best Practice 1: Define SLAs First
Before writing test code:
1. Engage stakeholders
ββ Product: What do users tolerate?
ββ Ops: What's infrastructure capable of?
ββ Eng: What's reasonable to achieve?
2. Document SLAs
ββ p95 <300ms
ββ p99 <800ms
ββ Success rate >99.9%
ββ Uptime >99.95%
3. Build tests to validate SLAs
ββ Test configured with assertions
ββ Fails if SLA not met
ββ Clear pass/fail result
Lesson: SLAs should come first, tests second.
β Best Practice 2: Test in Realistic Environments
Testing checklist:
Infrastructure:
β Staging matches production (same config, scale)
β Database is production-sized (at least)
β Network latency simulated (not zero)
β External services accessible (or mocked realistically)
Data:
β Real data patterns (not synthetic nonsense)
β Production-sized dataset (not just 100 rows)
β Realistic skew (80% users browse, 20% purchase)
Load:
β Peak traffic levels (not undersized)
β Realistic user behavior (think-time, variety)
β Distribution across features (not single endpoint)
Lesson: Realistic testing = reliable results.
β Best Practice 3: Establish Baseline, Then Optimize
Workflow:
1. Baseline test
ββ Run current code/infrastructure
ββ Record: p95=800ms, CPU=70%
ββ This is your baseline (might be bad, but you know it)
2. Optimization attempt
ββ Change code: Add caching, optimize query, etc.
ββ Re-test with SAME load
3. Compare
ββ p95: 800ms β 300ms β 62% improvement
ββ CPU: 70% β 30% β Much better
ββ Declare success!
Why baseline matters:
ββ You have a target to beat
ββ Optimization benefits are measurable
ββ Without baseline, "faster" is subjective
ββ Good for reporting improvement to business
Lesson: Baseline first, optimize second, measure improvement.
β Best Practice 4: Version Control Your Tests
Git repo structure:
gatling-tests/
βββ README.md (how to run tests)
βββ src/test/java/simulations/
β βββ baseline_scenario.java
β βββ peak_load_scenario.java
β βββ spike_scenario.java
β βββ soak_scenario.java
βββ src/test/resources/
β βββ data/
β βββ bodies/
βββ reports/
β βββ 2024-01-15-baseline.html
β βββ 2024-01-15-peak.html
β βββ ... (historical results)
βββ pom.xml
Benefits:
ββ Track changes to scenarios over time
ββ Reproduce exact test conditions
ββ Share tests with team
ββ CI/CD integration easy
Lesson: Treat tests like codeβversion control them.
β Best Practice 5: Document Results
Test Report Template:
Date: 2024-01-15
Simulation: Peak Load Test
Environment: Staging
Duration: 30 minutes
Objectives:
- Verify system handles Black Friday traffic
- Find breaking point at 10,000 concurrent users
Scenario:
- 100 new users per second for 30 minutes
- Realistic think-time (5-30 second pauses)
- CSV feeder with 10,000 product IDs
- Equal distribution: 70% browse, 20% purchase, 10% support
Results:
- Total requests: 180,000
- Successful: 179,640 (99.8%) β
- Failed: 360 (0.2%)
- p95 latency: 450ms β (target: <500ms)
- p99 latency: 950ms β (target: <1000ms)
- Max RPS achieved: 1,850 (during ramp-down)
- Peak CPU (server): 78%
- Peak memory (server): 4.2GB (stable)
Bottlenecks:
- Database query time increasing with load
ββ Root cause: Missing index on product.category column
ββ Evidence: Datadog shows 300ms in SELECT query
ββ Fix: Add index (planned for next sprint)
Recommendation:
β PASS SLA for current load levels
β Ready for 2x traffic growth
β Recommend index optimization for 3x growth
Next Steps:
1. Add database index
2. Re-test to confirm improvement
3. Plan capacity for Q1
Lesson: Document findings for future reference and accountability.
β Best Practice 6: Automate Tests in CI/CD
GitHub Actions example:
name: Load Tests
on:
schedule:
- cron: '0 2 * * *' # Nightly at 2am
jobs:
load-test:
runs-on: ubuntu-latest
services:
- docker (for staging environment)
steps:
- uses: actions/checkout@v2
- name: Run Gatling baseline test
run: mvn gatling:test -Dgatling.simulationClass=BaselineTest
- name: Check assertions
run: | # Fail if SLAs not met
if ! grep -q "Assertions passed" target/gatling/*/index.html; then
exit 1
fi
- name: Upload report
uses: actions/upload-artifact@v2
with:
name: gatling-report
path: target/gatling/*/
- name: Notify Slack
run: |
curl -X POST -d '{"text":"Load test completed"}' $SLACK_WEBHOOK
Benefits: - Tests run automatically - SLA violations caught immediately - Historical trends tracked - Prevent regression
Lesson: Automate tests; don't rely on manual execution.
β Best Practice 7: Communicate Results
Stakeholder Communication:
For Product/Business:
ββ "System handles 10,000 concurrent users β"
ββ "p95 latency is 300ms (target: 300ms) β"
ββ "Safe to launch feature on schedule"
For Ops/DevOps:
ββ "Peak CPU: 75%, headroom: 25%"
ββ "Database connections: 45/50 (one more scaling needed)"
ββ "Infrastructure recommendation: Add 1 more instance"
For Engineering:
ββ "Database query is bottleneck (400ms out of 500ms latency)"
ββ "Adding index on users.email should improve by 60%"
ββ "Priority: Optimize slow query, then re-test"
For Executives:
ββ "We can handle 3x current traffic"
ββ "No infrastructure scaling needed yet"
ββ "Ready for holiday season traffic surge"
Lesson: Tailor message to audience; everyone speaks different language.
Summary: Do's and Don'ts
β DO:
- Define clear SLAs before testing
- Test in realistic staging environment
- Use realistic think-time in scenarios
- Monitor both client (Gatling) and server (Datadog)
- Test multiple scenarios (load, ramp, soak, spike)
- Establish baseline before optimizing
- Document findings and communicate results
- Re-test after any changes
- Automate tests in CI/CD pipeline
- Use percentiles (p95, p99), not mean
β DON'T:
- Fire all requests instantly (no think-time)
- Load test from single overloaded machine
- Test against production
- Start test without warm-up
- Track only mean latency
- Test only happy path (ignore errors)
- Run one test and assume you're done
- Monitor only Gatling, ignore server metrics
- Deploy without load test validation
- Test at same load level each time (no learning)
Next Steps
β Read next: Gatling Concepts: Architecture - Understand Gatling framework