Reasoning and thinking¶

StreamOptions.Reasoning is a provider-neutral effort level. Each adapter maps it to the target provider's native form—Anthropic adaptive or budget thinking, or OpenAI-compatible reasoning fields—and clamps it to the levels supported by the selected model. Non-reasoning models ignore it, so the same option is safe to set on any model.

options := llm.StreamOptions{Reasoning: llm.ModelThinkingHigh}
response, err := llm.Complete(ctx, model, llm.Prompt("..."), options)

At a glance¶

Task	API
Set the effort level	`StreamOptions.Reasoning` (`ModelThinkingLevel`)
Available levels	`ModelThinkingOff` / `Minimal` / `Low` / `Medium` / `High` / `XHigh`
Check what a model supports	`SupportedThinkingLevels(model)`, `ClampThinkingLevel(model, level)`
Whether a model can reason	`Model.Reasoning` (bool)
Read thinking while streaming	`EventThinkingStart` / `Delta` / `End`
Read thinking from the final message	`ThinkingContent` (`Thinking`, `ThinkingSignature`, `Redacted`)
Control how thinking is returned (Anthropic)	`AnthropicStreamOptions.ThinkingDisplay`

Effort only decides how much the model thinks. Whether the thinking text is returned with the response is a separate, orthogonal knob — on Anthropic it is controlled by ThinkingDisplay (see Anthropic thinking display).

Effort levels¶

A higher level lets the model spend more tokens thinking before it answers, trading latency and cost for quality on hard problems. Leaving Reasoning empty uses the model's own default.

Level	Effect	When to use
`ModelThinkingOff`	Disable thinking entirely	Simple tasks; latency- or cost-sensitive paths
`ModelThinkingMinimal`	Smallest thinking budget	A light nudge to reason
`ModelThinkingLow`	Light reasoning	Everyday tasks
`ModelThinkingMedium`	Balanced reasoning	A safe default
`ModelThinkingHigh`	Extended reasoning for hard tasks	Math, planning, multi-step problems
`ModelThinkingXHigh`	Maximum thinking budget	The hardest problems, cost aside

Under the hood the level maps to each provider's own controls: on Anthropic a thinking-token budget (or adaptive thinking), on OpenAI-compatible providers a reasoning_effort field. The neutral level keeps your code the same across both.

Thinking tokens count toward Usage.Output and bill at the same output rate as generated text, so a higher level makes each request cost more. See Reading responses for usage and cost.

Check what a model supports¶

Not every model accepts every level. SupportedThinkingLevels reports the levels a model accepts, and ClampThinkingLevel snaps a requested level to the nearest supported one. Stream and Complete clamp automatically, but calling it yourself is useful to drive a UI or to skip the option when a model cannot reason.

levels := llm.SupportedThinkingLevels(model)
if len(levels) == 0 {
    // Model has no reasoning support; do not offer the control.
}

// Snap a user's choice to something the model accepts.
requested := llm.ModelThinkingXHigh
effective := llm.ClampThinkingLevel(model, requested)
if effective != requested {
    log.Printf("model caps thinking at %s", effective)
}

response, err := llm.Complete(ctx, model, input, llm.StreamOptions{
    Reasoning: effective,
})

Model.Reasoning is a quick boolean check for whether a model reasons at all.

Read the thinking back¶

While streaming, reasoning arrives in its own block—EventThinkingStart, EventThinkingDelta, EventThinkingEnd—before the answer text, so you can render it separately from the final reply.

for event := range events {
    switch event.Type {
    case llm.EventThinkingDelta:
        fmt.Fprint(thinkingPane, event.Delta)
    case llm.EventTextDelta:
        fmt.Fprint(answerPane, event.Delta)
    }
}

From a completed message, the reasoning is a ThinkingContent block in response.Content. Thinking holds the text; ThinkingSignature carries the provider signature replayed on later turns; Redacted marks thinking the provider withheld.

for _, block := range response.Content {
    if t, ok := block.(*llm.ThinkingContent); ok && !t.Redacted {
        fmt.Println("reasoning:", t.Thinking)
    }
}

Anthropic thinking display¶

On the Anthropic protocol, ThinkingDisplay controls how reasoning is returned without changing whether the model reasons. An empty value defaults to summarized thinking.

options := llm.StreamOptions{
    Reasoning: llm.ModelThinkingHigh,
    ProtocolOptions: &llm.AnthropicStreamOptions{
        ThinkingDisplay: llm.ThinkingDisplaySummarized,
    },
}

ThinkingDisplayOmitted withholds the thinking text while retaining the signature needed for multi-turn tool use. Use it when the application must not display reasoning content but still needs valid history for follow-up requests.

options := llm.StreamOptions{
    Reasoning: llm.ModelThinkingHigh,
    ProtocolOptions: &llm.AnthropicStreamOptions{
        ThinkingDisplay: llm.ThinkingDisplayOmitted,
    },
}

With ThinkingDisplayOmitted, no EventThinkingDelta events arrive and the ThinkingContent block is marked Redacted.

Conversation continuity¶

Reasoning metadata needed by a provider—such as Anthropic signatures and OpenRouter encrypted reasoning—is retained in assistant messages and replayed when required by later tool calls. This matters most for tool use with thinking: some providers require the signed thinking block to be sent back verbatim before they will accept the next tool call, so dropping it can make the turn fail. The library keeps the block (even when ThinkingDisplayOmitted hides its text) so the history stays valid. When the target model changes, it preserves, downgrades, or omits reasoning content according to compatibility. See Conversations for model switching and persistence.