The anchor scenario¶

It's Monday morning. A ticket lands in your service desk queue:

Subject: Can't log in — urgent

Body: Hi, I've been locked out of my account since this morning. Also my VPN keeps dropping every few minutes. Not sure if these are related but I need access ASAP for a client call at 10am. This happened right after the security team sent out that patch notice last Friday.

Every module connects back to the same Monday morning ticket:

One ticket. Three possible things going on:

Intent	What it would mean	Resolution time	Who handles it
`account_unlock`	User is locked out — password reset or re-enablement needed	~4 min	Tier-1 agent
`vpn_issue`	VPN client broken after the Friday patch	~18 min	Tier-2 network team
`security_incident`	The "patch notice" was a phishing email; account may be compromised	~45 min	Senior engineer + audit trail + possibly HR

The stakes are not equal. Misrouting a security incident as a password reset doesn't just waste time — it leaves an active attacker inside the network while a tier-1 agent cheerfully sends a password reset link.

This is the problem your AI copilot has to solve correctly, every time, at scale.

Every module adds one more layer of precision to that solution.

modules¶

#	File	Topic	What it teaches
00	00-story.md	The Monday Ticket	The scenario, the stakes, the three possible intents
01	01-tokenization.md	Tokenization	How the model reads the ticket; token budgets; what gets cut
02	02-probability.md	Probability	Classifier confidence; expected handle time; routing thresholds
03	03-logits-softmax.md	Logits & Softmax	Where probabilities come from; why margin matters
04	04-entropy.md	Entropy	Measuring spread of uncertainty; the three-tier routing policy
05	05-variance-stddev.md	Variance & Std Dev	System stability; the go/no-go deployment gate
06	06-determinism.md	Determinism vs Stochastic	Which steps need reproducibility; which benefit from variation
07	07-regression.md	Regression	Predicting resolution time; residual tracking
08	08-classification-calibration.md	Calibration	Are the probabilities actually trustworthy
09	09-correlation-causation.md	Correlation vs Causation	Are you fixing the right thing; confounder detection; pilot design
10	10-sampling-controls.md	Sampling Controls	Temperature, top-p, top-k; when to constrain generation
11	11-guardrails-thresholds.md	Guardrails & Thresholds	Hard rules layered on top of probabilistic outputs
12	12-evaluation-in-production.md	Evaluation in Production	How to measure a live system; metrics that matter
13	13-bias-fairness.md	Bias & Fairness	Disaggregated accuracy; FNR by segment; segment-specific thresholds; proxy discrimination
14	14-interpretability.md	Interpretability	Which words drove the classification; why the night-shift gap exists at the token level
15	15-adversarial-testing.md	Adversarial Testing	What if a user crafts a ticket to trick the classifier; red-teaming your pipeline
16	16-human-in-the-loop.md	Human-in-the-loop	When the model defers; how analyst decisions feed back; override logging
17	17-fine-tuning.md	Fine-Tuning	When prompt engineering stops being enough; what fine-tuning actually changes
18	18-rag.md	Retrieval-Augmented Generation	Giving the model access to your knowledge base at inference time
19	19-cost-latency.md	Cost & Latency	Token costs at scale; latency budgets; the accuracy vs speed trade-off
20	20-drift-retraining.md	Drift & Retraining	How production data shifts over time; when and how to retrain
21	21-incident-postmortem.md	Incident & Postmortem	When the pipeline fails; how to investigate, document, and fix it

How the modules connect¶

FOUNDATION (what the model does)
01 Tokenization → 02 Probability → 03 Logits & Softmax → 04 Entropy

STABILITY (can you trust the system)
05 Variance → 06 Determinism → 08 Calibration → 09 Correlation vs Causation

CAPABILITY (what the system produces)
07 Regression → 10 Sampling Controls → 11 Guardrails → 17 Fine-Tuning → 18 RAG

EVALUATION (is it working)
12 Evaluation in Production → 13 Bias & Fairness → 14 Interpretability → 20 Drift & Retraining

SAFETY (what can go wrong)
15 Adversarial Testing → 16 Human-in-the-loop → 21 Incident & Postmortem

OPERATIONS (real-world constraints)
19 Cost & Latency → runs as a thread through all the above