Streaming¶
Use Stream to process text and reasoning as they are generated.
1. Start the stream. Stream returns a channel immediately; the request
runs in the background and delivers events as they arrive.
events, err := llm.Stream(
context.Background(),
model,
llm.Prompt("Explain Go channels briefly."),
llm.StreamOptions{Reasoning: llm.ModelThinkingHigh},
)
if err != nil {
log.Fatal(err)
}
2. Consume events with a type switch. Print text and reasoning deltas as
they come, capture the final message on EventDone, and stop on EventError.
var finalMessage *llm.AssistantMessage
for event := range events {
switch event.Type {
case llm.EventThinkingDelta, llm.EventTextDelta:
fmt.Print(event.Delta)
case llm.EventDone:
finalMessage = event.Message
case llm.EventError:
log.Fatal(event.Err)
}
}
3. Read the final message for the stop reason, token usage, and cost once the channel closes.
fmt.Printf("\nstop=%s tokens=%d cost=$%.6f\n",
finalMessage.StopReason,
finalMessage.Usage.TotalTokens,
finalMessage.Usage.Cost.Total,
)
Thinking events are emitted only when the selected model and provider expose reasoning content.
Full program
package main
import (
"context"
"fmt"
"log"
"github.com/ktsoator/or/llm"
_ "github.com/ktsoator/or/llm/openai" // registers the OpenAI-compatible protocol
)
func main() {
model := llm.GetModel("deepseek", "deepseek-v4-flash")
events, err := llm.Stream(
context.Background(),
model,
llm.Prompt("Explain Go channels briefly."),
llm.StreamOptions{Reasoning: llm.ModelThinkingHigh},
)
if err != nil {
log.Fatal(err)
}
var finalMessage *llm.AssistantMessage
for event := range events {
switch event.Type {
case llm.EventThinkingDelta:
fmt.Print(event.Delta)
case llm.EventTextDelta:
fmt.Print(event.Delta)
case llm.EventDone:
finalMessage = event.Message
case llm.EventError:
log.Fatal(event.Err)
}
}
if finalMessage == nil {
log.Fatal("stream closed without a final message")
}
fmt.Printf("\nstop=%s tokens=%d cost=$%.6f\n",
finalMessage.StopReason,
finalMessage.Usage.TotalTokens,
finalMessage.Usage.Cost.Total,
)
}
Event reference¶
A stream opens with EventStart, emits one start → delta… → end group per
content block (text, thinking, or tool call, possibly interleaved), and closes
with exactly one terminal event:
flowchart LR
start(["EventStart"]) --> blocks
subgraph blocks["one group per content block"]
direction LR
bs["…Start"] --> bd["…Delta<br/><small>× many</small>"] --> be["…End"]
end
blocks --> outcome{"outcome"}
outcome -->|success| done(["EventDone<br/><small>Message = final AssistantMessage</small>"])
outcome -->|failure / cancel| err(["EventError<br/><small>Err + partial Message</small>"])
classDef ok stroke:#16a34a,stroke-width:2px;
classDef bad stroke:#dc2626,stroke-width:2px;
class done ok;
class err bad;
Every non-terminal event carries a Partial snapshot; the … prefix stands for
Text, Thinking, or ToolCall.
| Event | Meaning | Main fields |
|---|---|---|
EventStart |
The provider stream started | Partial |
EventTextStart |
A text block started | ContentIndex, Partial |
EventTextDelta |
A text fragment arrived | ContentIndex, Delta, Partial |
EventTextEnd |
A text block completed | ContentIndex, Content, Partial |
EventThinkingStart |
A reasoning block started | ContentIndex, Partial |
EventThinkingDelta |
A reasoning fragment arrived | ContentIndex, Delta, Partial |
EventThinkingEnd |
A reasoning block completed | ContentIndex, Content, Partial |
EventToolCallStart |
A tool call block started | ContentIndex, ToolCall, Partial |
EventToolCallDelta |
A raw tool-argument JSON fragment arrived | ContentIndex, Delta, ToolCall, Partial |
EventToolCallEnd |
A tool call finished streaming, arguments parsed best-effort | ContentIndex, ToolCall, Partial |
EventDone |
The request completed successfully | Message |
EventError |
The request failed or was cancelled | Err, Message |
EventDone.Message is the final assistant message and contains content, usage,
cost, and stop reason. EventError.Message may contain partial content and
usage. The channel emits exactly one terminal event and then closes. See
Reading responses for how to interpret the final message: stop
reasons, token usage and cost, diagnostics, and context-overflow detection.
Events from different content blocks may be interleaved. Use ContentIndex to
associate deltas with their block. Every non-terminal event carries a Partial
snapshot of the assistant message built so far.
Tool-call deltas and diagnostics¶
EventToolCallDelta.Delta contains raw partial JSON. EventToolCallEnd carries
the call with arguments parsed best-effort: malformed or truncated JSON degrades
to the fields received so far, or to an empty object. Validate arguments before
use, collect tool calls while streaming, and execute them only after
EventDone. Never execute calls from a response that ends with EventError.
When arguments could not be parsed strictly, the response records a
tool_arguments_recovered entry in Message.Diagnostics. Its recovery mode
is repaired, partial, or invalid. Inspect diagnostics before executing a
tool with side effects. A safe application declines partial and invalid
arguments and returns a tool error so the model can retry.
Cancellation¶
Cancelling the request context stops an in-flight request. The stream emits one
EventError whose message reports StopReasonAborted, then closes.
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
events, err := llm.Stream(ctx, model, input, llm.StreamOptions{})
if err != nil {
log.Fatal(err)
}
// Call cancel() from elsewhere, for example when the user presses Stop.
for event := range events {
switch event.Type {
case llm.EventTextDelta:
fmt.Print(event.Delta)
case llm.EventError:
fmt.Printf("\nstopped: %s\n", event.Message.StopReason)
}
}
Use the independent per-attempt Timeout option for transport deadlines; see
Request configuration.