Module 11: Guardrails and Thresholds¶
What's the big idea?¶
Your model generates a probability: 0.72 for intent A.
Now what?
Guardrails turn that number into an action. They're like traffic lights:
- 🟢 Green: AUTO (system handles it)
- 🟡 Yellow: REVIEW (human takes a look)
- 🔴 Red: ABSTAIN (don't touch it, escalate)
Without guardrails, teams either auto-route everything and fail spectacularly, or escalate everything and burn out analysts.
With guardrails, you get controlled automation with safe fallback.
The math in plain terms¶
Define three confidence bands:
- \(\tau_{high}\) is your high confidence boundary
- \(\tau_{low}\) is your low confidence boundary
- Everything in between goes to a human
Real-world scenario: Auto-close policy¶
You handle three ticket categories with different risk profiles:
Low-risk (e.g., "how do I reset password?"): - Auto-close if confidence ≥ 0.90 - Human review if confidence 0.70–0.90 - Abstain if confidence < 0.70
High-risk (e.g., "potential security breach"): - Auto-close if confidence ≥ 0.98 (super high bar) - Human review if confidence 0.60–0.98 (lower threshold, more caution) - Abstain if confidence < 0.60
Different thresholds, different risk levels. Makes sense.
How to try this¶
Run notebooks/math-foundations/11_guardrails_thresholds.ipynb and:
- Simulate different threshold bands on real confidence scores
- Measure auto-rate, review-rate, abstain-rate
- Compare outcomes under strict vs relaxed policies
The traps to avoid¶
❌ The trap: Tune only for "automation rate" (move more tickets automatically)
✅ The smart move: Tune for business impact: quality, SLA, incidents, analyst load
Checklist for you¶
- [ ] Are thresholds defined per risk class, not globally?
- [ ] Are fallback paths actually implemented and tested?
- [ ] Is abstention logged and monitored?
- [ ] Do threshold changes go through a review/approval process?