Temperature, Top-p & Sampling Parameters
What it is
Think of sampling parameters like the knobs on a music mixer: the model has already learned to play every instrument (the probability distribution over tokens), and these controls decide how strictly it should follow its own instincts — picking the safe, obvious note every time, or occasionally reaching for something surprising.
Sampling parameters are the set of numerical controls that govern how a language model converts its internal probability distribution over vocabulary tokens into a single output token at each generation step. They sit between the model's raw logits and the final chosen token, shaping the trade-off between predictability and variety.
It is not the case that these parameters make the model smarter or change what it knows — they only change how the model selects from what it already believes is plausible. A well-designed prompt matters far more than any sampling setting. → see Prompt engineering for the fuller picture on output quality levers.
How it works
Every token generation follows a shared pipeline. The model's final linear layer produces one unnormalized score — a logit — per vocabulary entry (typically 32,000–128,000 tokens for modern models). Sampling parameters then filter, reshape, and sample from those scores before a single token is emitted. This happens independently at every single generation step.
