Providers and models¶

The package includes two protocol adapters:

openai-completions
anthropic-messages

The catalog and compatibility layer explicitly configure these providers:

Provider	Provider ID	Protocol	Environment variable
DeepSeek	`deepseek`	`openai-completions`	`DEEPSEEK_API_KEY`
MiniMax Global	`minimax`	`anthropic-messages`	`MINIMAX_API_KEY`
MiniMax China	`minimax-cn`	`anthropic-messages`	`MINIMAX_CN_API_KEY`
Xiaomi MiMo	`xiaomi`	`openai-completions`	`XIAOMI_API_KEY` or `MIMO_API_KEY`
Z.AI Global	`zai`	`openai-completions`	`ZAI_API_KEY`
Zhipu Coding Plan China	`zai-coding-cn`	`openai-completions`	`ZAI_CODING_CN_API_KEY`
Moonshot AI Global	`moonshotai`	`openai-completions`	`MOONSHOT_API_KEY`
Moonshot AI China	`moonshotai-cn`	`openai-completions`	`MOONSHOT_API_KEY`
Kimi Coding	`kimi-coding`	`anthropic-messages`	`KIMI_API_KEY`

The catalog also contains metadata for additional compatible providers and models. Those entries can be queried and may work through one of the two protocol adapters, but they have not all been verified against live provider APIs and are not a support guarantee. Automated tests exercise both adapters with local mock servers rather than live integration tests for every provider.

Only the key for the provider selected by llm.GetModel is read. Request-scoped credentials can also be supplied with StreamOptions.APIKey or StreamOptions.Env.

Discover models¶

Query the catalog instead of hard-coding model IDs supplied dynamically:

for _, provider := range llm.GetProviders() {
    fmt.Println(provider)
    for _, model := range llm.GetModels(provider) {
        fmt.Printf("  %s: %s\n", model.ID, model.Name)
    }
}

model, ok := llm.LookupModel("xiaomi", "mimo-v2-flash")
if !ok {
    log.Fatal("model not found")
}

LookupModel returns a model and a found flag. GetModel is convenient for a known catalog entry and panics when the provider or model ID does not exist.

Model metadata¶

A Model is also a read-only metadata record. Inspect it to drive UI, enforce limits, or estimate cost before a request:

Field	Type	Meaning
`ID`	`string`	Identifier sent to the provider
`Name`	`string`	Human-readable display name
`Provider`	`string`	Vendor key, e.g. `anthropic`
`Protocol`	`Protocol`	Which adapter handles the model
`BaseURL`	`string`	Endpoint base URL
`Headers`	`map[string]string`	Default headers merged into each request
`Reasoning`	`bool`	Whether the model can produce thinking
`Input`	`[]ModelInput`	Accepted modalities: `Text`, `Image`
`ContextWindow`	`int64`	Maximum total tokens (input + output)
`MaxTokens`	`int64`	Maximum tokens the model may generate
`Cost`	`ModelCost`	Per-million-token pricing
`Compatibility`	`ModelCompatibility`	Protocol-specific overrides (see below)

Reasoning reports only whether thinking is possible; use SupportedThinkingLevels to read the exact levels a model accepts rather than the raw ThinkingLevelMap.

Cost holds prices per million tokens, matching how CalculateCost computes a charge:

Field	Meaning
`Input`	Price per million input tokens
`Output`	Price per million output tokens
`CacheRead`	Price per million cache-read tokens
`CacheWrite`	Price per million cache-write tokens

model, _ := llm.LookupModel("deepseek", "deepseek-v4-flash")
fmt.Printf("%s: %d-token window, $%.2f/M in, $%.2f/M out\n",
    model.Name, model.ContextWindow, model.Cost.Input, model.Cost.Output)

See Reading responses for the matching Usage and UsageCost records on a completed request.

Custom and compatible endpoints¶

Any endpoint implementing one of the built-in protocols can be used by constructing a Model directly and setting BaseURL. This covers local servers such as Ollama, vLLM, and LM Studio, as well as private model gateways:

model := llm.Model{
    ID:            "qwen2.5-coder:7b",
    Name:          "Qwen2.5 Coder 7B",
    Provider:      "ollama",
    Protocol:      llm.ProtocolOpenAICompletions,
    BaseURL:       "http://localhost:11434/v1",
    Input:         []llm.ModelInput{llm.Text},
    ContextWindow: 32768,
    MaxTokens:     4096,
}

events, err := llm.Stream(ctx, model, input, llm.StreamOptions{APIKey: "ollama"})

Endpoint-specific behavior—reasoning field names, cache-control support, and similar differences—is configured through Model.Compatibility with OpenAICompletionsCompatibility or AnthropicMessagesCompatibility. Set only the fields that differ from the default; each is a pointer so an unset field leaves the adapter's behavior unchanged.

supports := func(b bool) *bool { return &b }

// OpenAI-compatible endpoint that names its cap "max_completion_tokens"
// and accepts a reasoning effort field.
model.Compatibility = &llm.OpenAICompletionsCompatibility{
    MaxTokensField:          "max_completion_tokens",
    SupportsReasoningEffort: supports(true),
}

// Anthropic-compatible endpoint that does not support cache control.
model.Compatibility = &llm.AnthropicMessagesCompatibility{
    SupportsCacheControl: supports(false),
}

For a wire protocol that is neither OpenAI-compatible nor Anthropic-compatible, implement a custom protocol adapter.