Skip to content

Streaming

Use Stream to process text and reasoning as they are generated.

1. Start the stream. Stream returns a channel immediately; the request runs in the background and delivers events as they arrive.

events, err := llm.Stream(
    context.Background(),
    model,
    llm.Prompt("Explain Go channels briefly."),
    llm.StreamOptions{Reasoning: llm.ModelThinkingHigh},
)
if err != nil {
    log.Fatal(err)
}

2. Consume events with a type switch. Print text and reasoning deltas as they come, capture the final message on EventDone, and stop on EventError.

var finalMessage *llm.AssistantMessage
for event := range events {
    switch event.Type {
    case llm.EventThinkingDelta, llm.EventTextDelta:
        fmt.Print(event.Delta)
    case llm.EventDone:
        finalMessage = event.Message
    case llm.EventError:
        log.Fatal(event.Err)
    }
}

3. Read the final message for the stop reason, token usage, and cost once the channel closes.

fmt.Printf("\nstop=%s tokens=%d cost=$%.6f\n",
    finalMessage.StopReason,
    finalMessage.Usage.TotalTokens,
    finalMessage.Usage.Cost.Total,
)

Thinking events are emitted only when the selected model and provider expose reasoning content.

Full program
package main

import (
    "context"
    "fmt"
    "log"

    "github.com/ktsoator/or/llm"
    _ "github.com/ktsoator/or/llm/openai" // registers the OpenAI-compatible protocol
)

func main() {
    model := llm.GetModel("deepseek", "deepseek-v4-flash")
    events, err := llm.Stream(
        context.Background(),
        model,
        llm.Prompt("Explain Go channels briefly."),
        llm.StreamOptions{Reasoning: llm.ModelThinkingHigh},
    )
    if err != nil {
        log.Fatal(err)
    }

    var finalMessage *llm.AssistantMessage
    for event := range events {
        switch event.Type {
        case llm.EventThinkingDelta:
            fmt.Print(event.Delta)
        case llm.EventTextDelta:
            fmt.Print(event.Delta)
        case llm.EventDone:
            finalMessage = event.Message
        case llm.EventError:
            log.Fatal(event.Err)
        }
    }
    if finalMessage == nil {
        log.Fatal("stream closed without a final message")
    }
    fmt.Printf("\nstop=%s tokens=%d cost=$%.6f\n",
        finalMessage.StopReason,
        finalMessage.Usage.TotalTokens,
        finalMessage.Usage.Cost.Total,
    )
}

Event reference

A stream opens with EventStart, emits one start → delta… → end group per content block (text, thinking, or tool call, possibly interleaved), and closes with exactly one terminal event:

flowchart LR
    start(["EventStart"]) --> blocks

    subgraph blocks["one group per content block"]
        direction LR
        bs["…Start"] --> bd["…Delta<br/><small>× many</small>"] --> be["…End"]
    end

    blocks --> outcome{"outcome"}
    outcome -->|success| done(["EventDone<br/><small>Message = final AssistantMessage</small>"])
    outcome -->|failure / cancel| err(["EventError<br/><small>Err + partial Message</small>"])

    classDef ok stroke:#16a34a,stroke-width:2px;
    classDef bad stroke:#dc2626,stroke-width:2px;
    class done ok;
    class err bad;

Every non-terminal event carries a Partial snapshot; the prefix stands for Text, Thinking, or ToolCall.

Event Meaning Main fields
EventStart The provider stream started Partial
EventTextStart A text block started ContentIndex, Partial
EventTextDelta A text fragment arrived ContentIndex, Delta, Partial
EventTextEnd A text block completed ContentIndex, Content, Partial
EventThinkingStart A reasoning block started ContentIndex, Partial
EventThinkingDelta A reasoning fragment arrived ContentIndex, Delta, Partial
EventThinkingEnd A reasoning block completed ContentIndex, Content, Partial
EventToolCallStart A tool call block started ContentIndex, ToolCall, Partial
EventToolCallDelta A raw tool-argument JSON fragment arrived ContentIndex, Delta, ToolCall, Partial
EventToolCallEnd A tool call finished streaming, arguments parsed best-effort ContentIndex, ToolCall, Partial
EventDone The request completed successfully Message
EventError The request failed or was cancelled Err, Message

EventDone.Message is the final assistant message and contains content, usage, cost, and stop reason. EventError.Message may contain partial content and usage. The channel emits exactly one terminal event and then closes. See Reading responses for how to interpret the final message: stop reasons, token usage and cost, diagnostics, and context-overflow detection.

Events from different content blocks may be interleaved. Use ContentIndex to associate deltas with their block. Every non-terminal event carries a Partial snapshot of the assistant message built so far.

Tool-call deltas and diagnostics

EventToolCallDelta.Delta contains raw partial JSON. EventToolCallEnd carries the call with arguments parsed best-effort: malformed or truncated JSON degrades to the fields received so far, or to an empty object. Validate arguments before use, collect tool calls while streaming, and execute them only after EventDone. Never execute calls from a response that ends with EventError.

When arguments could not be parsed strictly, the response records a tool_arguments_recovered entry in Message.Diagnostics. Its recovery mode is repaired, partial, or invalid. Inspect diagnostics before executing a tool with side effects. A safe application declines partial and invalid arguments and returns a tool error so the model can retry.

Cancellation

Cancelling the request context stops an in-flight request. The stream emits one EventError whose message reports StopReasonAborted, then closes.

ctx, cancel := context.WithCancel(context.Background())
defer cancel()

events, err := llm.Stream(ctx, model, input, llm.StreamOptions{})
if err != nil {
    log.Fatal(err)
}

// Call cancel() from elsewhere, for example when the user presses Stop.
for event := range events {
    switch event.Type {
    case llm.EventTextDelta:
        fmt.Print(event.Delta)
    case llm.EventError:
        fmt.Printf("\nstopped: %s\n", event.Message.StopReason)
    }
}

Use the independent per-attempt Timeout option for transport deadlines; see Request configuration.