What is Temperature? | AI & LLM Glossary

Temperature is a parameter that controls the randomness of a language model's output by adjusting the probability distribution over possible next tokens. Lower temperatures produce more deterministic, focused responses, while higher temperatures yield more diverse and creative outputs.

When an LLM generates text, it predicts a probability distribution over its entire vocabulary for each next token. Temperature modifies this distribution before sampling occurs. Mathematically, the logits (raw model outputs) are divided by the temperature value before being converted to probabilities through the softmax function.

At temperature 0 (or near 0), the model becomes nearly deterministic, always selecting the highest-probability token. This produces consistent, predictable outputs that closely follow the most likely completion. This setting is ideal for factual questions, data extraction, and tasks where consistency and accuracy matter most.

At temperature 1.0, the original probability distribution is used unchanged. As temperature increases above 1.0, the distribution becomes flatter, giving lower-probability tokens a greater chance of being selected. This introduces more variety and unpredictability, which can be beneficial for creative writing, brainstorming, or generating diverse alternatives.

Temperature interacts with other sampling parameters like top-p (nucleus sampling) and top-k. In practice, most applications use temperature values between 0 and 1.0. Finding the right temperature requires experimentation, as the optimal setting depends on the specific task, the model being used, and the desired balance between creativity and reliability.

How It Works

Logit computation

The model processes the input and produces raw logit scores for every token in its vocabulary, representing the unnormalized likelihood of each token being the next in the sequence.

Temperature scaling

Each logit score is divided by the temperature value. Low temperatures amplify differences between high and low probability tokens; high temperatures reduce these differences.

Softmax normalization

The scaled logits are passed through the softmax function to produce a valid probability distribution, which now reflects the temperature-adjusted likelihoods.

Token sampling

A token is sampled from the adjusted probability distribution. At low temperatures, the top token is almost always chosen; at high temperatures, a wider variety of tokens have meaningful selection probability.

Examples

Factual question answering

A customer support bot uses temperature 0 to ensure consistent, accurate answers to factual questions about product features and policies, minimizing the chance of generating incorrect information.

Creative writing assistant

A story writing tool uses temperature 0.8 to generate varied and imaginative plot suggestions, character descriptions, and dialogue, producing outputs that feel more natural and less repetitive.

Code generation with alternatives

A coding assistant uses temperature 0.2 for standard code generation to ensure correctness, but switches to temperature 0.7 when the user asks for alternative approaches to a problem, exploring different solutions.

Why It Matters

Temperature is one of the most important parameters for controlling LLM behavior. Choosing the right temperature directly impacts output quality: too low and responses become repetitive and generic, too high and they become incoherent or unreliable. Understanding temperature is essential for tuning any LLM application.

Frequently Asked Questions

What temperature should I use for my LLM application?

For factual tasks, data extraction, and classification, use temperature 0 to 0.2. For general conversation and balanced responses, use 0.5 to 0.7. For creative tasks like writing or brainstorming, use 0.7 to 1.0. Always test with your specific use case, as optimal values vary by model and task.

What is the difference between temperature and top-p?

Temperature scales the entire probability distribution, affecting all tokens. Top-p (nucleus sampling) truncates the distribution by only considering tokens whose cumulative probability exceeds a threshold. Both control randomness but in different ways. Many practitioners use one or the other rather than combining them.

Does temperature 0 always give the same output?

Nearly always, but not guaranteed. Some implementations introduce tiny amounts of randomness even at temperature 0, and GPU floating point operations can produce slight variations. For truly deterministic outputs, some APIs offer a seed parameter alongside temperature 0.

Can I change temperature during a conversation?

Yes, temperature is set per API call, so you can adjust it dynamically. For example, you might use low temperature for answering factual questions and switch to higher temperature when the user asks for creative suggestions, all within the same conversation.

Track Temperature Impact with Respan

Respan lets you analyze how temperature settings affect your LLM outputs across different use cases. Compare response quality metrics at different temperature values, track consistency scores, and identify the optimal temperature settings for each prompt template in your application.

Try Respan free

What is Temperature? | AI & LLM Glossary

How It Works

Logit computation

The model processes the input and produces raw logit scores for every token in its vocabulary, representing the unnormalized likelihood of each token being the next in the sequence.

Temperature scaling

Each logit score is divided by the temperature value. Low temperatures amplify differences between high and low probability tokens; high temperatures reduce these differences.

Softmax normalization

The scaled logits are passed through the softmax function to produce a valid probability distribution, which now reflects the temperature-adjusted likelihoods.

Token sampling

Examples

Factual question answering

A customer support bot uses temperature 0 to ensure consistent, accurate answers to factual questions about product features and policies, minimizing the chance of generating incorrect information.

Creative writing assistant

A story writing tool uses temperature 0.8 to generate varied and imaginative plot suggestions, character descriptions, and dialogue, producing outputs that feel more natural and less repetitive.

Code generation with alternatives

Why It Matters

Frequently Asked Questions

What temperature should I use for my LLM application?

What is the difference between temperature and top-p?

Does temperature 0 always give the same output?

Can I change temperature during a conversation?

Track Temperature Impact with Respan

Try Respan free

What is Temperature? | AI & LLM Glossary

How It Works

Examples

Why It Matters

Related Terms

Frequently Asked Questions

Track Temperature Impact with Respan

What is Temperature? | AI & LLM Glossary

How It Works

Examples

Why It Matters

Related Terms

Frequently Asked Questions

Track Temperature Impact with Respan