Notebook Code Explanations and Math Reference¶
Lab 00: Warmup¶
Basic Probability Calculation¶
# Normalize raw scores by dividing each value by the total
probs = [v / sum(exp_shifted) for v in exp_shifted]
Math: normalization creates a valid distribution where probabilities sum to 1. This makes outputs comparable across different runs and inputs.
Softmax Formula¶
Also check Pre-Activation
Softmax converts raw scores (logits) into probabilities in [0, 1].
It also magnifies relative score differences so the model can rank choices clearly.
Lab 01: Tokenization¶
Token Counting¶
def pseudo_token_count(text):
return len(text.split())
This notebook uses a simple approximation. Real models use specialized tokenizers. Use this approximation to reason about budget pressure before adding production tokenizers.
Budget Constraint Equation¶
Constraint:
Operational decisions:
- Short tickets: keep full history.
- Long tickets: summarize or trim history.
The key idea is that token budget is a capacity constraint, similar to memory limits.
Lab 02: Probability¶
Normalization Step¶
total = sum(raw_scores.values())
probs = {k: v / total for k, v in raw_scores.items()}
This turns arbitrary scores into valid confidence values for policy thresholds.
Expected Value Formula¶
Example workload estimate:
6 * 0.6 + 14 * 0.25 + 32 * 0.15 = 11.9minutes per ticket.
Expected value helps estimate staffing load from predicted intent mix.
Lab 03: Logits and Softmax¶
Numerical Stability¶
shift = max(scaled)
exps = [math.exp(x - shift) for x in scaled]
Equivalent identity:
Subtracting a constant avoids overflow but keeps the same probability ratios.
Temperature Scaling¶
-
T < 1: sharper, more deterministic. -
T = 1: baseline. -
T > 1: flatter, more stochastic.
Use lower temperature for routing and higher temperature for exploratory drafting.
Lab 04: Entropy¶
Shannon Entropy¶
Higher entropy means more uncertainty spread across classes.
Normalized Entropy¶
Policy example:
- Auto-route only if top probability is high and entropy is low.
Combining both signals reduces overconfident automation mistakes.
Lab 05: Variance and Standard Deviation¶
Variance Formula¶
Mean shows center; variance and standard deviation show stability.
Stability Policy¶
def stability_decision(std, good=0.20, watch=0.35):
if std <= good:
return "go"
if std <= watch:
return "monitor"
return "hold"
-
Low spread: stable for production.
-
High spread: investigate before rollout.
This keeps quality predictable across repeated runs.
Lab 06: Determinism¶
Argmax Operation¶
idx = max(range(len(p)), key=lambda i: p[i])
choice = labels[idx]
Argmax is deterministic for fixed inputs. It is preferred in high-control workflows where reproducibility matters.
Sampling Operation¶
def sample_choice(labels, p, seed=None):
rng = random.Random(seed)
...
Sampling is stochastic; seeding enables reproducibility. This is useful when you want diversity but still need auditability.
Lab 07: Regression¶
Linear Regression Model¶
-
b0: intercept. -
b1: slope.
Interpret slope as "expected output change per one-unit input increase."
Residual Analysis¶
Persistent residual patterns indicate bias. Residual shape often reveals missing features or non-linear behavior.
Error Metrics¶
RMSE penalizes large misses more heavily than MAE.
Lab 08: Classification and Calibration¶
Precision and Recall¶
Threshold tuning is a business trade-off between false positives and false negatives.
Calibration Check¶
Compare predicted probabilities vs observed frequencies per bin.
- Well-calibrated model: predicted and observed rates are close.
If they diverge, confidence thresholds should be recalibrated.
Lab 09: Correlation vs Causation¶
Correlation Coefficient¶
r measures association, not causality.
Treat correlation as an investigation signal, not a policy proof.
Confounder Segmentation¶
-
Segment by severity/queue/time before making policy conclusions.
-
Use controlled experiments for causal claims.
Segmentation helps reveal confounders before rollout decisions.
Lab 10: Sampling Controls¶
Top-k Filtering¶
pairs = sorted(zip(labels, probs), key=lambda x: x[1], reverse=True)[:k]
renormalize(pairs)
Top-k caps candidate count; top-p caps cumulative probability mass.
Top-p Filtering¶
keep = []
total = 0.0
for label, pr in sorted_pairs:
keep.append((label, pr))
total += pr
if total >= p:
break
Renormalization¶
Renormalization is required so filtered probabilities still sum to 1.
Lab 11: Guardrails and Thresholds¶
Three-Band Policy¶
def action(p, low=0.55, high=0.80):
if p >= high:
return 'auto'
if p <= low:
return 'abstain'
return 'human-review'
-
Auto:
p >= high. -
Review:
low < p < high. -
Abstain:
p <= low.
This three-band policy maps confidence directly to operational action.
Threshold Tuning¶
-
More auto: faster, riskier.
-
More review: safer, costlier.
Tune by risk tier rather than using one global threshold.
Lab 12: Evaluation in Production¶
Weighted KPI¶
KPI = 0.4 * precision + 0.4 * sla_hit_rate + 0.2 * inverse_override_rate
Weights should reflect business priorities and compliance risk.
Rollout Gate¶
delta = candidate_kpi - baseline_kpi
if delta >= 0.01:
rollout
elif delta <= -0.01:
rollback
else:
hold
Decision bands:
delta >= 0.01: expand rollout.-0.01 < delta < 0.01: hold and monitor.delta <= -0.01: rollback.
Predefined gates prevent subjective rollout decisions under pressure.
General Tips for Code and Math¶
- Validate assumptions (probabilities sum to 1, thresholds documented).
- Write formulas first, then code.
- Test edge cases.
- Monitor operational metrics, not only prediction outputs.