Experiment 1: Temperature Sweep¶
The Question¶
How does temperature affect the distribution of token probabilities?
- When temperature is low, does the model "lock in" to one answer?
- When temperature is high, does it explore more options?
- Is there a sweet spot?
What You'll Do¶
You'll run the same logits through the sampling pipeline with different temperatures, then watch how the probability distribution changes.
The Setup¶
Open experiments/local/sampling_simulator.py and look for this section:
if __name__ == '__main__':
tokens = ['approve', 'reject', 'review', 'escalate', 'delay', 'audit', 'optimize', 'notify', 'assign', 'close']
logits = np.array([2.2, 1.8, 1.4, 0.9, 0.2, 0.1, -0.3, -0.6, -0.8, -1.0], dtype=float)
These fake logits represent a case where the model leans toward 'approve' but has uncertainty below that.
Your Experiment: Isolate Temperature¶
Run the simulator 6 times with these settings (only changing temperature):
| Run | Temperature | Top-p | Top-k | What to observe |
|---|---|---|---|---|
| 1 | 0.2 | 1.0 | 0 | Very narrow, peaked distribution |
| 2 | 0.5 | 1.0 | 0 | Narrower than baseline |
| 3 | 1.0 | 1.0 | 0 | "Baseline" distribution (unchanged logits) |
| 4 | 1.5 | 1.0 | 0 | Flatter; more tokens have viable probability |
| 5 | 2.0 | 1.0 | 0 | Very flat; even low-prob tokens get sampled |
| 6 | 0.1 | 1.0 | 0 | Extreme: almost always 'approve' |
How to Run¶
Modify the settings list in the simulator to include these temperatures:
settings = [
dict(name='T=0.1 (extreme determinist)', temperature=0.1, top_k=0, top_p=1.0),
dict(name='T=0.2 (very det.)', temperature=0.2, top_k=0, top_p=1.0),
dict(name='T=0.5 (det.)', temperature=0.5, top_k=0, top_p=1.0),
dict(name='T=1.0 (baseline)', temperature=1.0, top_k=0, top_p=1.0),
dict(name='T=1.5 (creative)', temperature=1.5, top_k=0, top_p=1.0),
dict(name='T=2.0 (very creative)', temperature=2.0, top_k=0, top_p=1.0),
]
Then run:
python3 experiments/local/sampling_simulator.py
What to Watch For¶
As you increase temperature from 0.1 โ 2.0, observe:
-
Entropy: Does it increase? (It shouldโhigher temp = flatter distribution = more uncertainty)
-
Top token probability:
- At T=0.1, 'approve' should get ~95%+
-
At T=2.0, 'approve' might drop to ~30-40%
-
Number of viable tokens:
- At T=0.1, maybe only 2-3 tokens have non-trivial probability
- At T=2.0, maybe 7-8 tokens all get sampled regularly
Example output:
=== T=0.1 (extreme determinist) ===
Entropy: 0.412
approve 0.967
reject 0.023
review 0.005
escalate 0.002
...
=== T=2.0 (very creative) ===
Entropy: 2.891
optimize 0.152
approve 0.138
notify 0.134
assign 0.130
reject 0.129
...
The Insight¶
Temperature is a probability distribution stretcher.
- Low T: Squeezes the distribution โ a few tokens dominate
- High T: Spreads the distribution โ many tokens compete equally
- The order (which token is most likely) doesn't change, only the gap between them
Math Behind It¶
Temperature scales the logits before softmax:
- Divide by small T (e.g., 0.1) โ logits get bigger spread โ softmax peaks sharply
- Divide by large T (e.g., 2.0) โ logits get smaller spread โ softmax flattens
Real-World Implication¶
- Summarization (wants consistency): Use T=0.1 to 0.3
- Brainstorming (wants variety): Use T=0.8 to 1.5
- Classification (wants correctness): Use T=0.0 to 0.2
Next Step¶
Once you've run this experiment and seen the pattern, you're ready for:
๐ Experiment 2: Top-p (Nucleus Sampling) โ
Optional: Dive Deeper¶
Want to add visualization? After running the simulator, you can plot entropy vs. temperature:
import matplotlib.pyplot as plt
temps = [0.1, 0.2, 0.5, 1.0, 1.5, 2.0]
entropies = []
for t in temps:
out = run_experiment(tokens, logits, temperature=t, n_samples=10000)
entropies.append(out['entropy'])
plt.plot(temps, entropies, marker='o', label='Entropy')
plt.xlabel('Temperature')
plt.ylabel('Entropy (bits)')
plt.title('How Temperature Affects Distribution Spread')
plt.grid(True)
plt.legend()
plt.show()
This visually shows the relationship.