Reasoning and thinking¶
StreamOptions.Reasoning is a provider-neutral effort level. Each adapter maps
it to the target provider's native form—Anthropic adaptive or budget thinking,
or OpenAI-compatible reasoning fields—and clamps it to the levels supported by
the selected model. Non-reasoning models ignore it, so the same option is safe
to set on any model.
options := llm.StreamOptions{Reasoning: llm.ModelThinkingHigh}
response, err := llm.Complete(ctx, model, llm.Prompt("..."), options)
At a glance¶
| Task | API |
|---|---|
| Set the effort level | StreamOptions.Reasoning (ModelThinkingLevel) |
| Available levels | ModelThinkingOff / Minimal / Low / Medium / High / XHigh |
| Check what a model supports | SupportedThinkingLevels(model), ClampThinkingLevel(model, level) |
| Whether a model can reason | Model.Reasoning (bool) |
| Read thinking while streaming | EventThinkingStart / Delta / End |
| Read thinking from the final message | ThinkingContent (Thinking, ThinkingSignature, Redacted) |
| Control how thinking is returned (Anthropic) | AnthropicStreamOptions.ThinkingDisplay |
Effort only decides how much the model thinks. Whether the thinking text is
returned with the response is a separate, orthogonal knob — on Anthropic it is
controlled by ThinkingDisplay (see Anthropic thinking display).
Effort levels¶
A higher level lets the model spend more tokens thinking before it answers,
trading latency and cost for quality on hard problems. Leaving Reasoning
empty uses the model's own default.
| Level | Effect | When to use |
|---|---|---|
ModelThinkingOff |
Disable thinking entirely | Simple tasks; latency- or cost-sensitive paths |
ModelThinkingMinimal |
Smallest thinking budget | A light nudge to reason |
ModelThinkingLow |
Light reasoning | Everyday tasks |
ModelThinkingMedium |
Balanced reasoning | A safe default |
ModelThinkingHigh |
Extended reasoning for hard tasks | Math, planning, multi-step problems |
ModelThinkingXHigh |
Maximum thinking budget | The hardest problems, cost aside |
Under the hood the level maps to each provider's own controls: on Anthropic a
thinking-token budget (or adaptive thinking), on OpenAI-compatible providers a
reasoning_effort field. The neutral level keeps your code the same across both.
Thinking tokens count toward Usage.Output and bill at the same output rate as
generated text, so a higher level makes each request cost more. See
Reading responses for usage and cost.
Check what a model supports¶
Not every model accepts every level. SupportedThinkingLevels reports the
levels a model accepts, and ClampThinkingLevel snaps a requested level to the
nearest supported one. Stream and Complete clamp automatically, but calling
it yourself is useful to drive a UI or to skip the option when a model cannot
reason.
levels := llm.SupportedThinkingLevels(model)
if len(levels) == 0 {
// Model has no reasoning support; do not offer the control.
}
// Snap a user's choice to something the model accepts.
requested := llm.ModelThinkingXHigh
effective := llm.ClampThinkingLevel(model, requested)
if effective != requested {
log.Printf("model caps thinking at %s", effective)
}
response, err := llm.Complete(ctx, model, input, llm.StreamOptions{
Reasoning: effective,
})
Model.Reasoning is a quick boolean check for whether a model reasons at all.
Read the thinking back¶
While streaming, reasoning arrives in its own block—EventThinkingStart,
EventThinkingDelta, EventThinkingEnd—before the answer text, so you can
render it separately from the final reply.
for event := range events {
switch event.Type {
case llm.EventThinkingDelta:
fmt.Fprint(thinkingPane, event.Delta)
case llm.EventTextDelta:
fmt.Fprint(answerPane, event.Delta)
}
}
From a completed message, the reasoning is a ThinkingContent block in
response.Content. Thinking holds the text; ThinkingSignature carries the
provider signature replayed on later turns; Redacted marks thinking the
provider withheld.
for _, block := range response.Content {
if t, ok := block.(*llm.ThinkingContent); ok && !t.Redacted {
fmt.Println("reasoning:", t.Thinking)
}
}
Anthropic thinking display¶
On the Anthropic protocol, ThinkingDisplay controls how reasoning is returned
without changing whether the model reasons. An empty value defaults to
summarized thinking.
options := llm.StreamOptions{
Reasoning: llm.ModelThinkingHigh,
ProtocolOptions: &llm.AnthropicStreamOptions{
ThinkingDisplay: llm.ThinkingDisplaySummarized,
},
}
ThinkingDisplayOmitted withholds the thinking text while retaining the
signature needed for multi-turn tool use. Use it when the application must not
display reasoning content but still needs valid history for follow-up requests.
options := llm.StreamOptions{
Reasoning: llm.ModelThinkingHigh,
ProtocolOptions: &llm.AnthropicStreamOptions{
ThinkingDisplay: llm.ThinkingDisplayOmitted,
},
}
With ThinkingDisplayOmitted, no EventThinkingDelta events arrive and the
ThinkingContent block is marked Redacted.
Conversation continuity¶
Reasoning metadata needed by a provider—such as Anthropic signatures and
OpenRouter encrypted reasoning—is retained in assistant messages and replayed
when required by later tool calls. This matters most for tool use with thinking:
some providers require the signed thinking block to be sent back verbatim before
they will accept the next tool call, so dropping it can make the turn fail. The
library keeps the block (even when ThinkingDisplayOmitted hides its text) so
the history stays valid. When the target model changes, it preserves, downgrades,
or omits reasoning content according to compatibility. See
Conversations for model switching and persistence.