07.02 · Playwright RCA & Auto-Fix — Deep Dive (Case 2)
Level: Advanced
Pre-reading: 07 · Use Cases · 06.03 · CI/CD Integration
The Problem
End-to-end tests written in Playwright are expensive to maintain. When they fail in CI, the investigation cycle is: 1. Developer sees failure in GitHub Actions 2. Clicks through to the report 3. Tries to reproduce locally 4. Traces back to root cause
This cycle takes 30–120 minutes per failure. An AI agent can do the analysis in < 2 minutes.
Full Architecture
graph TD
A[CI: Playwright tests fail] --> B[GitHub Actions: trigger AI agent workflow]
B --> C[Download: test report + screenshots + network log]
C --> D[LangGraph agent starts]
D --> E[Parse test report: extract failing tests]
E --> F[For each failing test: classify failure type]
F --> G{Failure type?}
G -->|Selector| H[Find + fix selector in test file]
G -->|API contract| I[Compare expected vs actual, post RCA]
G -->|Flaky timing| J[Add waitFor or retry assertion]
G -->|Environment| K[Post environment RCA, no code fix]
H --> L[Push fix commit to PR branch]
I --> M[Post RCA comment on PR]
J --> L
K --> M
L --> N[Re-trigger CI checks]
Failure Classification
The agent reads the failure message and classifies before choosing a response:
| Failure Pattern | Classification | Agent Action |
|---|---|---|
locator.click: Timeout 30000ms exceeded |
Flaky / timing | Add explicit waitFor, then push fix |
expect(locator).toHaveText('...') |
Assertion failure | Check if feature regressed or spec changed |
net::ERR_CONNECTION_REFUSED |
Environment | Post env issue RCA, alert DevOps |
Response status: 404 |
API change | Check endpoint, update test or report broken API |
Response status: 500 |
Backend error | Pull service logs, generate RCA |
strict mode violation: ... resolved to 3 elements |
Selector ambiguous | Fix selector to be more specific |
Playwright Test Code Analysis
The agent reads the failing test to understand intent:
sequenceDiagram
participant Agent
participant MCP as Playwright MCP
participant GitHub
Agent->>MCP: get_failed_tests(report_path)
MCP-->>Agent: [{ name: "should complete checkout", error: "...", file: "checkout.spec.ts" }]
Agent->>GitHub: read_file("tests/checkout.spec.ts")
GitHub-->>Agent: Full test file
Agent->>Agent: Understand: test opens /checkout, fills form, clicks submit, asserts confirmation
Agent->>Agent: Error: selector '#submit-btn' not found
Agent->>GitHub: search_code("submit-btn", repo="storefront")
GitHub-->>Agent: No results — selector was renamed to 'data-testid=checkout-submit'
Agent->>Agent: Fix: replace '#submit-btn' with '[data-testid="checkout-submit"]'
RCA Document Structure
When the agent can identify but not automatically fix the root cause, it generates an RCA document posted as a PR comment:
## 🤖 AI Test Failure Analysis — checkout.spec.ts
**Failing Test:** `should complete checkout flow`
**Failure Type:** API Contract Break
**Confidence:** High
### Root Cause
The `/api/v2/orders` endpoint now returns `orderReference` in the response body
instead of `orderId`. The Playwright test asserts `response.orderId` which is undefined.
### Evidence
- Network log shows `POST /api/v2/orders` → 201 response: `{ "orderReference": "ORD-123" }`
- Test assertion: `expect(response.orderId).toBeDefined()` → fails
### Contributing Factor
No contract test between checkout-ui and order-service. API change in PR #847 was not
reflected in the E2E test.
### Recommended Fix
1. Update Playwright test to use `orderReference` instead of `orderId`
2. Add a Pact contract test for this API response schema to prevent future regressions
### Linked PR
The API change was introduced in: #847
Metrics and Continuous Improvement
Track agent performance over time:
| Metric | Target |
|---|---|
| Failures correctly classified | > 90% |
| Auto-fixes that pass CI | > 70% |
| RCA documents rated useful by developer | > 80% |
| Average time from CI failure to agent response | < 3 minutes |
| Reduction in developer investigation time | > 60% |
How do you prevent the agent from masking real bugs by just updating test assertions?
Add a rule to the system prompt: "Never change an assertion to match a broken application behaviour. Only fix test selectors, wait conditions, and test data issues. If an assertion mismatch suggests a potential regression, classify as 'assertion failure' and generate an RCA instead." Enforce this with an output validator that flags assertion changes for human review.
Can the agent replay a failed Playwright test to gather more evidence?
Yes — using the Playwright MCP server, the agent can navigate to the URL, interact with the page, and capture screenshots and network logs in a sandboxed environment. This is more expensive (requires a browser environment) but provides much richer evidence, especially for timing-sensitive failures.