Length, stops & cost controls¶
max_tokens / max_completion_tokens¶
max_tokenslimits the number of generated tokens and is supported by Azure endpoints.- OpenAIโs reference also documents
max_completion_tokensas the preferred modern parameter, and notesmax_tokensmay be deprecated / incompatible with some reasoning models.
Use these to:
- bound cost
- bound latency
- enforce concise answers
stop¶
- Up to a small number of stop sequences can end generation.
- Useful for code blocks, list truncation control, or multi-part outputs.
Useful executive takeaway¶
Length limits are one of the strongest cost controls you have, independent of decoding randomness.
Next: Determinism & Debugging.