Skip to content

Providers and models

The package includes two protocol adapters:

  • openai-completions
  • anthropic-messages

The catalog and compatibility layer explicitly configure these providers:

Provider Provider ID Protocol Environment variable
DeepSeek deepseek openai-completions DEEPSEEK_API_KEY
MiniMax Global minimax anthropic-messages MINIMAX_API_KEY
MiniMax China minimax-cn anthropic-messages MINIMAX_CN_API_KEY
Xiaomi MiMo xiaomi openai-completions XIAOMI_API_KEY or MIMO_API_KEY
Z.AI Global zai openai-completions ZAI_API_KEY
Zhipu Coding Plan China zai-coding-cn openai-completions ZAI_CODING_CN_API_KEY
Moonshot AI Global moonshotai openai-completions MOONSHOT_API_KEY
Moonshot AI China moonshotai-cn openai-completions MOONSHOT_API_KEY
Kimi Coding kimi-coding anthropic-messages KIMI_API_KEY

The catalog also contains metadata for additional compatible providers and models. Those entries can be queried and may work through one of the two protocol adapters, but they have not all been verified against live provider APIs and are not a support guarantee. Automated tests exercise both adapters with local mock servers rather than live integration tests for every provider.

Only the key for the provider selected by llm.GetModel is read. Request-scoped credentials can also be supplied with StreamOptions.APIKey or StreamOptions.Env.

Discover models

Query the catalog instead of hard-coding model IDs supplied dynamically:

for _, provider := range llm.GetProviders() {
    fmt.Println(provider)
    for _, model := range llm.GetModels(provider) {
        fmt.Printf("  %s: %s\n", model.ID, model.Name)
    }
}

model, ok := llm.LookupModel("xiaomi", "mimo-v2-flash")
if !ok {
    log.Fatal("model not found")
}

LookupModel returns a model and a found flag. GetModel is convenient for a known catalog entry and panics when the provider or model ID does not exist.

Model metadata

A Model is also a read-only metadata record. Inspect it to drive UI, enforce limits, or estimate cost before a request:

Field Type Meaning
ID string Identifier sent to the provider
Name string Human-readable display name
Provider string Vendor key, e.g. anthropic
Protocol Protocol Which adapter handles the model
BaseURL string Endpoint base URL
Headers map[string]string Default headers merged into each request
Reasoning bool Whether the model can produce thinking
Input []ModelInput Accepted modalities: Text, Image
ContextWindow int64 Maximum total tokens (input + output)
MaxTokens int64 Maximum tokens the model may generate
Cost ModelCost Per-million-token pricing
Compatibility ModelCompatibility Protocol-specific overrides (see below)

Reasoning reports only whether thinking is possible; use SupportedThinkingLevels to read the exact levels a model accepts rather than the raw ThinkingLevelMap.

Cost holds prices per million tokens, matching how CalculateCost computes a charge:

Field Meaning
Input Price per million input tokens
Output Price per million output tokens
CacheRead Price per million cache-read tokens
CacheWrite Price per million cache-write tokens
model, _ := llm.LookupModel("deepseek", "deepseek-v4-flash")
fmt.Printf("%s: %d-token window, $%.2f/M in, $%.2f/M out\n",
    model.Name, model.ContextWindow, model.Cost.Input, model.Cost.Output)

See Reading responses for the matching Usage and UsageCost records on a completed request.

Custom and compatible endpoints

Any endpoint implementing one of the built-in protocols can be used by constructing a Model directly and setting BaseURL. This covers local servers such as Ollama, vLLM, and LM Studio, as well as private model gateways:

model := llm.Model{
    ID:            "qwen2.5-coder:7b",
    Name:          "Qwen2.5 Coder 7B",
    Provider:      "ollama",
    Protocol:      llm.ProtocolOpenAICompletions,
    BaseURL:       "http://localhost:11434/v1",
    Input:         []llm.ModelInput{llm.Text},
    ContextWindow: 32768,
    MaxTokens:     4096,
}

events, err := llm.Stream(ctx, model, input, llm.StreamOptions{APIKey: "ollama"})

Endpoint-specific behavior—reasoning field names, cache-control support, and similar differences—is configured through Model.Compatibility with OpenAICompletionsCompatibility or AnthropicMessagesCompatibility. Set only the fields that differ from the default; each is a pointer so an unset field leaves the adapter's behavior unchanged.

supports := func(b bool) *bool { return &b }

// OpenAI-compatible endpoint that names its cap "max_completion_tokens"
// and accepts a reasoning effort field.
model.Compatibility = &llm.OpenAICompletionsCompatibility{
    MaxTokensField:          "max_completion_tokens",
    SupportsReasoningEffort: supports(true),
}

// Anthropic-compatible endpoint that does not support cache control.
model.Compatibility = &llm.AnthropicMessagesCompatibility{
    SupportsCacheControl: supports(false),
}

For a wire protocol that is neither OpenAI-compatible nor Anthropic-compatible, implement a custom protocol adapter.