Decoder-only generation produces one token at a time: previous tokens become context for the next token.
<BOS> or user prompt.<EOS> or max length.| Method | Behavior | Trade-off |
|---|---|---|
| Greedy | Pick best token each step | Fast, deterministic, can be myopic |
| Beam search | Track top candidate sequences | Better global quality, slower |
| Top-k sampling | Sample from top-k tokens | Controlled diversity |
| Top-p sampling | Sample from dynamic probability mass | Adaptive diversity |