LLM Sampling Parameters: A Deep Intuition Guide¶
What Happens During Generation?¶
When an LLM generates a token, it:
- Runs a forward pass and outputs raw logits (one value per vocabulary token)
- Applies temperature scaling to reshape the distribution
- Applies top-k and/or top-p filters to prune candidates
- Applies repetition controls to discourage loops
- Samples from the final distribution
Each parameter acts at a different stage, so understanding where it acts matters as much as understanding what it does.
Core Mental Model
Parameters do not all act at the same time. They intervene at specific stages of decoding.
Raw Logits -> [Temperature] -> [Top-k] -> [Top-p] -> [Repetition Penalty] -> Sample
Why This Learning Path Exists¶
You have probably seen statements like:
- Temperature: "Higher = more creative"
- Top-p: "Dynamic nucleus sampling"
- Top-k: "Keep top-k tokens"
- Repetition penalty: "Reduce repetition"
Those are directionally true, but not enough for reliable tuning. The way to build intuition is to test parameters systematically.
How to use this guide
Treat each experiment as a controlled lab: isolate one variable first, then combine.
6-Experiment Learning Path¶
| # | Experiment | Focus | Takeaway |
|---|---|---|---|
| 1 | Temperature Sweep | How does randomness change with temperature? | Temperature directly reshapes probability distributions |
| 2 | Top-p (Nucleus) | How does top-p restrict candidates? | Top-p selects a dynamic set of tokens |
| 3 | Top-k | How does top-k differ from top-p? | Top-k is a fixed-size ranked cutoff |
| 4 | Combined Filters | What if you mix temperature + top-p + top-k? | Order matters; interactions can surprise you |
| 5 | Repetition Penalties | How do penalties discourage repeated tokens? | Penalties reduce repetition pressure before sampling |
| 6 | Real Use Cases | How do you tune for classification vs. summarization? | Different tasks need different strategies |
Total time: ~45 minutes for practical parameter intuition.
Outcome
You will leave this path with a reusable tuning workflow, not just one-off settings.
How the Experiments Work¶
Each experiment uses a local Python simulator that:
- Starts from fake logits (simulated next-token odds)
- Applies parameter transformations
- Samples many times to estimate the resulting distribution
- Reports metrics like entropy and top-token probability
Why this still works: the decoding math is the same regardless of whether you use a toy simulator or a production API.
Why local-first works
The simulator is not a language model, but the sampling math is the same math used by production models.
What You Will Learn¶
After the path, you should be able to:
- Explain what each parameter does (intuition + math)
- Predict interactions (for example, temperature with top-p)
- Diagnose common failures (loops, over-randomness, over-determinism)
- Choose task-appropriate presets (classification vs. brainstorming)
- Transfer the same logic to cloud/local model providers
Quick Start¶
Environment note
Run commands from the project root so relative paths resolve correctly.
# Optional virtual environment
python -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Run the local simulator
python3 notebooks/sampling_simulator.py
Example output:
=== Very deterministic (temp=0.2) ===
Entropy: 1.234
approve 0.851
reject 0.095
review 0.032
...
Ready to Begin?¶
๐ Start with Experiment 1: Temperature Sweep
Optional: Move to Cloud Experiments¶
Transfer principle
Learn mechanisms locally, then reuse the same reasoning with cloud-specific parameter names.
Once you understand fundamentals locally, test against real APIs:
- Compare platforms first: Multi-Cloud Overview
- Azure OpenAI: Setup
- OpenAI: Setup
- Google Vertex AI: Setup
- Amazon Bedrock: Setup
Quick Reference Card¶
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ LLM PARAMETER CHEAT SHEET โ
โ โโโโโโโโโโโโโโโฆโโโโโโโโโโโโโโโโฆโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฃ
โ Parameter โ Range โ Rule of Thumb โ
โ โโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฃ
โ Temperature โ 0.0 - 2.0 โ 0.2=precise, 0.7=balanced, โ
โ โ โ 1.0=creative โ
โ โโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฃ
โ Top-p โ 0.0 - 1.0 โ 0.9 default; lower=focused โ
โ โโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฃ
โ Top-k โ 1 - inf โ 40-50 precise; 100+ diverse โ
โ โโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฃ
โ Rep. Penalty โ 1.0 - 2.0 โ 1.0=off; 1.2=light; 1.5=hard โ
โโโโโโโโโโโโโโโโฉโโโโโโโโโโโโโโโโฉโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ