Skip to content

Provider Layer

The provider layer is the boundary between Umbra and any LLM API. It is built in three tiers: a registry of known provider types, a profiles store of your configured connections, and a gateway that routes requests and handles failure.

Built-in types (always available):

TypeLabelDefault URLKey required
openaiOpenAIhttps://api.openai.com/v1Yes
anthropicAnthropichttps://api.anthropic.com/v1Yes
openrouterOpenRouterhttps://openrouter.ai/api/v1Yes
mistralMistralhttps://api.mistral.ai/v1Yes
ollamaOllamahttp://127.0.0.1:11434/v1No
lmstudioLM Studiohttp://127.0.0.1:1234/v1No
openai-codexChatGPT Plus/Prohttps://chatgpt.com/backend-apiOptional (OAuth)
openai_compatibleCustom endpoint(set per profile)Optional
opencode-zenOpenCode Zenhttps://opencode.ai/zen/v1Optional

Default for new users. On first launch, if no provider profile is configured, Umbra offers to connect OpenCode Zen automatically. It provides a set of free models with no API key required — enough to try the agent right away. You can switch to any other provider at any time via umbra providers connect.

Profiles are persisted in ~/.umbra/providers.json. Each profile stores:

  • type — one of the provider type values above
  • label — human-readable name
  • baseUrl — the API base URL (overridable per profile)
  • apiKey — stored locally, never transmitted to Umbra
  • model — optional default model for this profile
  • extraHeaders — arbitrary HTTP headers injected on every request
  • options — provider-specific options map

One profile is marked as defaultProfileId. The active profile for each task is resolved at runtime.

DefaultProviderGateway is the single routing point for all LLM calls. It supports two routing modes:

By profile (profileId) — calls a single configured profile directly.

By chain (chainId) — iterates through an ordered list of profile entries; uses the first successful response. This enables automatic fallback across providers without any changes to the task code.

GatewayRequest
└─ #prepareRequest() # optional compression (off / standard / aggressive)
├─ compressToolOutput() # for tool result messages
└─ condenseProse() # for user/assistant messages
└─ #withRetries() # up to 2 attempts
└─ catalog.completeProfile() / completeProfileStream()
└─ #logResponse() # usage accounting + debug events

The gateway retries automatically on transient failures:

  • HTTP 429 (rate limited)
  • HTTP 5xx (server error)
  • AbortError (network timeout)
  • fetch failed (connection refused / DNS)

Non-retryable errors (4xx other than 429, schema validation, unknown profile) are thrown immediately. The backoff between retries is 1 second × attempt number.

Requests are typed and validated with Zod. Key fields:

FieldTypeNotes
modelstringOptional; profile default is used if omitted
messagesProviderChatMessage[]Roles: system, user, assistant, tool
toolsProviderToolDefinition[]Function-calling tools
toolChoiceauto | required | none
responseFormattext | json_object | json_schemaStructured output
thinkBudgetnumber | low | medium | high | maxExtended reasoning token budget
compressionLeveloff | standard | aggressivePre-request message compression

ModelsRegistry resolves model capabilities (context window, tool support, pricing, vision, reasoning, structured output) by fetching from models.dev/api.json with a 5-minute in-memory cache. If the model is not found there, it falls back to the HuggingFace model API, then to heuristic rules based on model name patterns.

Cost estimates (USD) are computed from actual token usage and the per-million pricing in the registry, and written to the usage log alongside each response.