Skip to content

Gateway, Routing & Token Economy

src/providers/provider-client.ts implements a dedicated client class per provider wire format, all normalising to the same ProviderCompleteResponse. No external SDK is used — every API call is a raw fetch.

Client classWire formatKey differences
OpenAICompatibleProviderClientPOST /chat/completions (OpenAI Chat)Base for Mistral, Ollama, OpenRouter, LM Studio, custom
OpenAIProviderClientPOST /responses (OpenAI Responses API)Different message serialisation; no streaming (falls back to chat)
AnthropicProviderClientPOST /messages (anthropic-version: 2023-06-01)x-api-key header; tool_use blocks; cache_creation_input_tokens
OllamaProviderClient/api/tags for listing; delegates completionsNormalises Ollama model list format
OpencodeZenProviderClientOpenAI-compatible + session/project headersFree tier; tool-name sanitiser (._); lazy-loaded optional module

SSE streaming is implemented in OpenAICompatibleProviderClient.completeStream(). The parser handles two SSE wire formats simultaneously:

  • OpenAI-compatiblechoices[].delta chunks + final usage with include_usage: true
  • Anthropic-formatcontent_block_start / content_block_delta / message_delta event types (emitted by some proxy-routed models)

The parser detects the format per-event via isAnthropicStreamFormat() and merges usage blocks from both sources, preferring Anthropic values when present.

Thinking / reasoning normalisation per provider:

ProviderMechanismParam sent
Anthropicthinking: { type: "enabled", budget_tokens: N }Named tiers → fixed budgets (low=4k, medium=10k, high=16k, max=32k)
OpenAI o-seriesreasoning_effort: low|medium|highDetected by ^o\d model pattern + blocklist
Mistral MagistralBuilt-in (no param)Nothing sent; response parsed from thinking array chunks
Mistral small/mediumOptional reasoning_effortSent only when thinkBudget is explicitly set

DefaultProviderGateway supports two routing modes in every call:

Profile routing (profileId) — direct call to one configured profile.

Chain routing (chainId) — iterates an ordered list of { profileId, model? } entries. The first successful response is returned; all others are tried silently on failure. Each entry can independently override the model, so a single chain can span multiple providers and model tiers.

Chains are stored in ~/.umbra/providers.json alongside profiles and are managed via the same CRUD API as profiles.

Retry policy (applies to both modes): up to 2 attempts on 429, HTTP 5xx, AbortError, or fetch failed. Backoff: 1 s × attempt index.

src/utils/compression.ts provides a five-level compression system applied to message history before each provider call.

Levels:

LevelMachine output (max lines)Search resultsProse
offunlimitedunlimitedunchanged
lite50030 files / 10 snippetsunchanged
standard20020 files / 5 snippetsunchanged
aggressive5010 files / 3 snippets, no contextfiller words stripped
ultra205 files / 1 snippet, no contextfiller words stripped

condenseMachineOutput — head + tail preservation with automatic critical-line extraction from truncated sections. Lines matching Error:, SyntaxError:, TypeError:, exit code [1-9], stack frames are always rescued even when the surrounding section is dropped.

compressToolOutput — unified entry point: if the tool result is JSON with fileBuckets, applies compressSearchResults (sorts by match count, caps files and snippets, optionally strips context lines). All other tool outputs go through condenseMachineOutput.

condenseProse — strips common English filler words (actually, basically, literally, very, just, etc.) in aggressive/ultra modes. Applied to user/assistant messages in history.

Level selection per run mode:

ModeCompression level
fullalways off (128 k context budget)
execalways aggressive (maximise harness iterations)
agent / planfrom settings.json → compression.level (default: standard)

Every provider call appends one line to ~/.umbra/usage.jsonl via UsageLogger. The file is an append-only JSONL — one JSON object per line — and survives daemon restarts.

UsageRecord fields:

FieldDescription
requestIdRandom per-request ID matching debug events
profileId / chainIdWhich profile or chain was used
sessionId / threadIdConversation identifiers
model / routeModel ID and profileId/model composite
inputTokens / outputTokens / totalTokensFrom provider usage block
reasoningTokensThinking tokens (Anthropic / OpenAI o-series)
cacheReadTokens / cacheWriteTokensAnthropic prompt cache hit/write
costEstimateUSD, computed from ModelsRegistry pricing
contextLimit / contextPercentContext window size and fill %
sourceactual (from provider) / estimated (local) / mixed
statussuccess or failed

Aggregation methods on UsageLogger:

  • getStats() — totals across all records
  • getStatsByModel() — per-model breakdown with avgCostPerRequest
  • getStatsByProvider() — per-profile breakdown
  • getStatsBySession() — per-session breakdown
  • generateReport() — formatted text report sorted by total cost

Global CLI Launch & Working Directory Contract

Section titled “Global CLI Launch & Working Directory Contract”

bin/umbra.js is the global entry-point shim. It selects the execution strategy at runtime without recompilation:

1. src/cli/main.ts exists?
→ node --import tsx/esm src/cli/main.ts (dev mode, no build needed)
2. dist/cli/main.js exists?
→ node dist/cli/main.js (installed package)
3. Neither → exit 1 with instructions

CLI flags handled by the shim (stripped before passing to cac):

FlagEffect
--debugSets UMBRA_DEBUG_SIDECAR=1; opens a sidecar debug console on Windows
--project <path>Sets UMBRA_PROJECT_PATH env var; overrides CWD for all project-path resolution

Signals SIGINT, SIGTERM, SIGHUP are forwarded to the child process so Ctrl-C works correctly even when the shim is the process group leader.

On Windows, .cmd scripts are spawned via cmd.exe /d /s /c to avoid Node.js DEP0190 and handle shim quoting correctly (src/cli/process-runner.ts).

src/core/permissions.ts implements a three-tier access control system.

Evaluation order (most permissive first):

1. exec-full mode → allow all (sandbox assumption)
2. chat-readonly mode + destructive tool → deny
3. Stored rules (most recently added wins) → allow / deny / allow_always
4. Non-destructive tool → allow without prompt
5. Interactive prompt → y / n / a (allow_always)

Destructive tools (always require evaluation outside exec-full): shell.exec, fs.write, fs.edit, fs.cd, git.status, git.diff, git.apply, git.commit, git.push, git.pull, web.search

Rule storage:

  • Global rules: ~/.umbra/permissions.json — applied to all projects
  • Project rules: <project>/umbra.permissions.json — applied when projectPath matches

Rule matching supports exact tool name (shell.exec), namespace wildcard (shell.*), or global wildcard (*). Rules have an optional expiresAt ISO timestamp.

WorkspaceTrustManager tracks trusted directories in ~/.umbra/trusted-paths.json. fs.cd to a path not under any trusted ancestor always triggers an interactive prompt regardless of mode. Choosing “always” registers the path as trusted.

Interactive prompt (TTY only) shows the tool name and a human-readable action summary, then reads y / n / a. In non-TTY environments (piped stdin, CI, tests), the answer is automatically deny.

All decisions are appended to ~/.umbra/debug/permissions.jsonl for audit.

ModelsRegistry resolves capabilities for any model ID at runtime:

  1. models.dev/api.json — authoritative dataset; flattened to provider/model tuples; cached for 5 minutes in-process
  2. HuggingFace model API — fallback for models not in models.dev
  3. Heuristics — name-pattern rules (-vision, -instruct, -r1, o\d, context window from common model families)

Capabilities resolved: contextWindow, supportsTools, supportsVision, supportsReasoning, supportsStructuredOutput, supportsAttachments, supportsTemperature, longContext (>100 k), interleaved (reasoning_content vs reasoning_details), inputModalities, outputModalities, pricingPerMillion (input, output, cacheRead, cacheWrite).

Mistral model dedup in listModels() is multi-pass:

  1. Filter non-chat and archived models
  2. Exact-ID dedupe
  3. Family grouping via iterative suffix stripping (-latest, -2604, -3.5, -3-5, -v2, -3) — handles compound suffixes like mistral-medium-3-5-2604 in two passes
  4. Prefer -latest alias over specific version within same family
  5. Score and sort by tier (Magistral → Large → Medium → Small → Devstral → Codestral → Ministral → Pixtral → Voxtral → Open → penalty for vibe-cli / labs- / deprecated)

The TUI is built with Ink (React for terminal). Key surfaces:

  • Thread list — paginated, searchable, archivable; forking creates a new thread from the current session state
  • Session view — streaming assistant output with reasoning_delta and assistant_delta events shown separately; tool call / result panels; permission approval dialog
  • Provider selector — profile list with status badges (connected / available / unavailable); inline connect/test/delete
  • Memory panel — per-project memory text; toggle useMemories / generateMemories

Simple-chat detection in resolveRunModeContract(): prompts of ≤8 words matching a hardcoded greeting set (hi, hello, thanks, привет, спасибо, etc.) get toolPreset: null and no compression. This avoids sending tool schemas and compression overhead for conversational turns.

Mode selection at the CLI / API level:

  • umbra (no args) → opens TUI in agent mode
  • umbra exec "<task>" → headless exec mode (harness loop, 30-min timeout)
  • umbra task add "<task>" → queues a background agent run
  • HTTP API POST /run with mode: "plan" → structured JSON plan output

src/cli/tui/markdown.ts renders markdown to ANSI escape sequences for terminal display. The renderer is line-by-line — no AST parsing — trading full CommonMark compliance for zero dependencies and instant output.

Block-level transforms (applied first):

InputOutput
# HeadingBold + cyan text
## HeadingBold + yellow text
--- / ===Dim ────────────────────────────────────────
- itemCyan - bullet
1. itemYellow numbered item
> quoteDim | quote

Inline transforms (applied after block transforms):

SyntaxOutput
`code`Yellow text
**bold**Bold text
*italic*Dim text
[text](url)Cyan text + dim (url)

All transforms reset ANSI state () at the end of each match so adjacent styles don’t bleed across tokens.