Provider Layer

The provider layer is the boundary between Umbra and any LLM API. It is built in three tiers: a registry of known provider types, a profiles store of your configured connections, and a gateway that routes requests and handles failure.

Provider types

Built-in types (always available):

Type	Label	Default URL	Key required
`openai`	OpenAI	`https://api.openai.com/v1`	Yes
`anthropic`	Anthropic	`https://api.anthropic.com/v1`	Yes
`openrouter`	OpenRouter	`https://openrouter.ai/api/v1`	Yes
`mistral`	Mistral	`https://api.mistral.ai/v1`	Yes
`ollama`	Ollama	`http://127.0.0.1:11434/v1`	No
`lmstudio`	LM Studio	`http://127.0.0.1:1234/v1`	No
`openai-codex`	ChatGPT Plus/Pro	`https://chatgpt.com/backend-api`	Optional (OAuth)
`openai_compatible`	Custom endpoint	(set per profile)	Optional
`opencode-zen`	OpenCode Zen	`https://opencode.ai/zen/v1`	Optional

Default for new users. On first launch, if no provider profile is configured, Umbra offers to connect OpenCode Zen automatically. It provides a set of free models with no API key required — enough to try the agent right away. You can switch to any other provider at any time via umbra providers connect.

Profile store

Profiles are persisted in ~/.umbra/providers.json. Each profile stores:

type — one of the provider type values above
label — human-readable name
baseUrl — the API base URL (overridable per profile)
apiKey — stored locally, never transmitted to Umbra
model — optional default model for this profile
extraHeaders — arbitrary HTTP headers injected on every request
options — provider-specific options map

One profile is marked as defaultProfileId. The active profile for each task is resolved at runtime.

Provider gateway

DefaultProviderGateway is the single routing point for all LLM calls. It supports two routing modes:

By profile (profileId) — calls a single configured profile directly.

By chain (chainId) — iterates through an ordered list of profile entries; uses the first successful response. This enables automatic fallback across providers without any changes to the task code.

Request pipeline

GatewayRequest
  └─ #prepareRequest()        # optional compression (off / standard / aggressive)
       ├─ compressToolOutput() # for tool result messages
       └─ condenseProse()      # for user/assistant messages
  └─ #withRetries()           # up to 2 attempts
       └─ catalog.completeProfile() / completeProfileStream()
  └─ #logResponse()           # usage accounting + debug events

Retry logic

The gateway retries automatically on transient failures:

HTTP 429 (rate limited)
HTTP 5xx (server error)
AbortError (network timeout)
fetch failed (connection refused / DNS)

Non-retryable errors (4xx other than 429, schema validation, unknown profile) are thrown immediately. The backoff between retries is 1 second × attempt number.

Request schema

Requests are typed and validated with Zod. Key fields:

Field	Type	Notes
`model`	`string`	Optional; profile default is used if omitted
`messages`	`ProviderChatMessage[]`	Roles: `system`, `user`, `assistant`, `tool`
`tools`	`ProviderToolDefinition[]`	Function-calling tools
`toolChoice`	`auto \| required \| none`
`responseFormat`	`text \| json_object \| json_schema`	Structured output
`thinkBudget`	`number \| low \| medium \| high \| max`	Extended reasoning token budget
`compressionLevel`	`off \| standard \| aggressive`	Pre-request message compression

Model capabilities registry

ModelsRegistry resolves model capabilities (context window, tool support, pricing, vision, reasoning, structured output) by fetching from models.dev/api.json with a 5-minute in-memory cache. If the model is not found there, it falls back to the HuggingFace model API, then to heuristic rules based on model name patterns.

Cost estimates (USD) are computed from actual token usage and the per-million pricing in the registry, and written to the usage log alongside each response.