07.02 · Playwright RCA & Auto-Fix — Deep Dive (Case 2)

Level: Advanced
Pre-reading: 07 · Use Cases · 06.03 · CI/CD Integration

The Problem

End-to-end tests written in Playwright are expensive to maintain. When they fail in CI, the investigation cycle is: 1. Developer sees failure in GitHub Actions 2. Clicks through to the report 3. Tries to reproduce locally 4. Traces back to root cause

This cycle takes 30–120 minutes per failure. An AI agent can do the analysis in < 2 minutes.

Full Architecture

graph TD
    A[CI: Playwright tests fail] --> B[GitHub Actions: trigger AI agent workflow]
    B --> C[Download: test report + screenshots + network log]
    C --> D[LangGraph agent starts]
    D --> E[Parse test report: extract failing tests]
    E --> F[For each failing test: classify failure type]
    F --> G{Failure type?}
    G -->|Selector| H[Find + fix selector in test file]
    G -->|API contract| I[Compare expected vs actual, post RCA]
    G -->|Flaky timing| J[Add waitFor or retry assertion]
    G -->|Environment| K[Post environment RCA, no code fix]
    H --> L[Push fix commit to PR branch]
    I --> M[Post RCA comment on PR]
    J --> L
    K --> M
    L --> N[Re-trigger CI checks]

Failure Classification

The agent reads the failure message and classifies before choosing a response:

Failure Pattern	Classification	Agent Action
`locator.click: Timeout 30000ms exceeded`	Flaky / timing	Add explicit `waitFor`, then push fix
`expect(locator).toHaveText('...')`	Assertion failure	Check if feature regressed or spec changed
`net::ERR_CONNECTION_REFUSED`	Environment	Post env issue RCA, alert DevOps
`Response status: 404`	API change	Check endpoint, update test or report broken API
`Response status: 500`	Backend error	Pull service logs, generate RCA
`strict mode violation: ... resolved to 3 elements`	Selector ambiguous	Fix selector to be more specific

Playwright Test Code Analysis

The agent reads the failing test to understand intent:

sequenceDiagram
    participant Agent
    participant MCP as Playwright MCP
    participant GitHub

    Agent->>MCP: get_failed_tests(report_path)
    MCP-->>Agent: [{ name: "should complete checkout", error: "...", file: "checkout.spec.ts" }]
    Agent->>GitHub: read_file("tests/checkout.spec.ts")
    GitHub-->>Agent: Full test file
    Agent->>Agent: Understand: test opens /checkout, fills form, clicks submit, asserts confirmation
    Agent->>Agent: Error: selector '#submit-btn' not found
    Agent->>GitHub: search_code("submit-btn", repo="storefront")
    GitHub-->>Agent: No results — selector was renamed to 'data-testid=checkout-submit'
    Agent->>Agent: Fix: replace '#submit-btn' with '[data-testid="checkout-submit"]'

RCA Document Structure

When the agent can identify but not automatically fix the root cause, it generates an RCA document posted as a PR comment:

## 🤖 AI Test Failure Analysis — checkout.spec.ts

**Failing Test:** `should complete checkout flow`  
**Failure Type:** API Contract Break  
**Confidence:** High

### Root Cause
The `/api/v2/orders` endpoint now returns `orderReference` in the response body
instead of `orderId`. The Playwright test asserts `response.orderId` which is undefined.

### Evidence
- Network log shows `POST /api/v2/orders` → 201 response: `{ "orderReference": "ORD-123" }`
- Test assertion: `expect(response.orderId).toBeDefined()` → fails

### Contributing Factor
No contract test between checkout-ui and order-service. API change in PR #847 was not
reflected in the E2E test.

### Recommended Fix
1. Update Playwright test to use `orderReference` instead of `orderId`
2. Add a Pact contract test for this API response schema to prevent future regressions

### Linked PR
The API change was introduced in: #847

Metrics and Continuous Improvement

Track agent performance over time:

Metric	Target
Failures correctly classified	> 90%
Auto-fixes that pass CI	> 70%
RCA documents rated useful by developer	> 80%
Average time from CI failure to agent response	< 3 minutes
Reduction in developer investigation time	> 60%

How do you prevent the agent from masking real bugs by just updating test assertions?

Add a rule to the system prompt: "Never change an assertion to match a broken application behaviour. Only fix test selectors, wait conditions, and test data issues. If an assertion mismatch suggests a potential regression, classify as 'assertion failure' and generate an RCA instead." Enforce this with an output validator that flags assertion changes for human review.

Can the agent replay a failed Playwright test to gather more evidence?

Yes — using the Playwright MCP server, the agent can navigate to the URL, interact with the page, and capture screenshots and network logs in a sandboxed environment. This is more expensive (requires a browser environment) but provides much richer evidence, especially for timing-sensitive failures.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search