| Parameter | What it controls | Typical usage |
|---|---|---|
| temperature | randomness of sampling | 0.2-0.4 factual, 0.8-1.2 creative |
| top_k | sample from top-k candidates | conservative diversity |
| top_p | sample from nucleus probability mass | adaptive diversity |
| repetition/frequency penalty | reduce repeated tokens | long outputs, summaries |
| max_tokens | hard output length cap | cost and latency control |
| stop sequences | forced termination string | structured response boundaries |
Avoid extreme values on both top_k and top_p at the same time. Tune one primary sampling control first, then temperature.