Gateway, Routing & Token Economy

Local Gateway & API Normalization

src/providers/provider-client.ts implements a dedicated client class per provider wire format, all normalising to the same ProviderCompleteResponse. No external SDK is used — every API call is a raw fetch.

Client class	Wire format	Key differences
`OpenAICompatibleProviderClient`	`POST /chat/completions` (OpenAI Chat)	Base for Mistral, Ollama, OpenRouter, LM Studio, custom
`OpenAIProviderClient`	`POST /responses` (OpenAI Responses API)	Different message serialisation; no streaming (falls back to chat)
`AnthropicProviderClient`	`POST /messages` (`anthropic-version: 2023-06-01`)	`x-api-key` header; `tool_use` blocks; `cache_creation_input_tokens`
`OllamaProviderClient`	`/api/tags` for listing; delegates completions	Normalises Ollama model list format
`OpencodeZenProviderClient`	OpenAI-compatible + session/project headers	Free tier; tool-name sanitiser (`.` → `_`); lazy-loaded optional module

SSE streaming is implemented in OpenAICompatibleProviderClient.completeStream(). The parser handles two SSE wire formats simultaneously:

OpenAI-compatible — choices[].delta chunks + final usage with include_usage: true
Anthropic-format — content_block_start / content_block_delta / message_delta event types (emitted by some proxy-routed models)

The parser detects the format per-event via isAnthropicStreamFormat() and merges usage blocks from both sources, preferring Anthropic values when present.

Thinking / reasoning normalisation per provider:

Provider	Mechanism	Param sent
Anthropic	`thinking: { type: "enabled", budget_tokens: N }`	Named tiers → fixed budgets (low=4k, medium=10k, high=16k, max=32k)
OpenAI o-series	`reasoning_effort: low\|medium\|high`	Detected by `^o\d` model pattern + blocklist
Mistral Magistral	Built-in (no param)	Nothing sent; response parsed from `thinking` array chunks
Mistral small/medium	Optional `reasoning_effort`	Sent only when `thinkBudget` is explicitly set

Smart Routing & Provider Chains

DefaultProviderGateway supports two routing modes in every call:

Profile routing (profileId) — direct call to one configured profile.

Chain routing (chainId) — iterates an ordered list of { profileId, model? } entries. The first successful response is returned; all others are tried silently on failure. Each entry can independently override the model, so a single chain can span multiple providers and model tiers.

Chains are stored in ~/.umbra/providers.json alongside profiles and are managed via the same CRUD API as profiles.

Retry policy (applies to both modes): up to 2 attempts on 429, HTTP 5xx, AbortError, or fetch failed. Backoff: 1 s × attempt index.

Prompt & Tool Output Compression

src/utils/compression.ts provides a five-level compression system applied to message history before each provider call.

Levels:

Level	Machine output (max lines)	Search results	Prose
`off`	unlimited	unlimited	unchanged
`lite`	500	30 files / 10 snippets	unchanged
`standard`	200	20 files / 5 snippets	unchanged
`aggressive`	50	10 files / 3 snippets, no context	filler words stripped
`ultra`	20	5 files / 1 snippet, no context	filler words stripped

condenseMachineOutput — head + tail preservation with automatic critical-line extraction from truncated sections. Lines matching Error:, SyntaxError:, TypeError:, exit code [1-9], stack frames are always rescued even when the surrounding section is dropped.

compressToolOutput — unified entry point: if the tool result is JSON with fileBuckets, applies compressSearchResults (sorts by match count, caps files and snippets, optionally strips context lines). All other tool outputs go through condenseMachineOutput.

condenseProse — strips common English filler words (actually, basically, literally, very, just, etc.) in aggressive/ultra modes. Applied to user/assistant messages in history.

Level selection per run mode:

Mode	Compression level
`full`	always `off` (128 k context budget)
`exec`	always `aggressive` (maximise harness iterations)
`agent` / `plan`	from `settings.json → compression.level` (default: `standard`)

Usage Accounting, Cost & Route Health

Every provider call appends one line to ~/.umbra/usage.jsonl via UsageLogger. The file is an append-only JSONL — one JSON object per line — and survives daemon restarts.

UsageRecord fields:

Field	Description
`requestId`	Random per-request ID matching debug events
`profileId` / `chainId`	Which profile or chain was used
`sessionId` / `threadId`	Conversation identifiers
`model` / `route`	Model ID and `profileId/model` composite
`inputTokens` / `outputTokens` / `totalTokens`	From provider usage block
`reasoningTokens`	Thinking tokens (Anthropic / OpenAI o-series)
`cacheReadTokens` / `cacheWriteTokens`	Anthropic prompt cache hit/write
`costEstimate`	USD, computed from `ModelsRegistry` pricing
`contextLimit` / `contextPercent`	Context window size and fill %
`source`	`actual` (from provider) / `estimated` (local) / `mixed`
`status`	`success` or `failed`

Aggregation methods on UsageLogger:

getStats() — totals across all records
getStatsByModel() — per-model breakdown with avgCostPerRequest
getStatsByProvider() — per-profile breakdown
getStatsBySession() — per-session breakdown
generateReport() — formatted text report sorted by total cost

Global CLI Launch & Working Directory Contract

bin/umbra.js is the global entry-point shim. It selects the execution strategy at runtime without recompilation:

1. src/cli/main.ts exists?
   → node --import tsx/esm src/cli/main.ts   (dev mode, no build needed)
2. dist/cli/main.js exists?
   → node dist/cli/main.js                   (installed package)
3. Neither → exit 1 with instructions

CLI flags handled by the shim (stripped before passing to cac):

Flag	Effect
`--debug`	Sets `UMBRA_DEBUG_SIDECAR=1`; opens a sidecar debug console on Windows
`--project <path>`	Sets `UMBRA_PROJECT_PATH` env var; overrides CWD for all project-path resolution

Signals SIGINT, SIGTERM, SIGHUP are forwarded to the child process so Ctrl-C works correctly even when the shim is the process group leader.

On Windows, .cmd scripts are spawned via cmd.exe /d /s /c to avoid Node.js DEP0190 and handle shim quoting correctly (src/cli/process-runner.ts).

Permissions & Access Policies

src/core/permissions.ts implements a three-tier access control system.

Evaluation order (most permissive first):

1. exec-full mode  →  allow all (sandbox assumption)
2. chat-readonly mode + destructive tool  →  deny
3. Stored rules (most recently added wins)  →  allow / deny / allow_always
4. Non-destructive tool  →  allow without prompt
5. Interactive prompt  →  y / n / a (allow_always)

Destructive tools (always require evaluation outside exec-full): shell.exec, fs.write, fs.edit, fs.cd, git.status, git.diff, git.apply, git.commit, git.push, git.pull, web.search

Rule storage:

Global rules: ~/.umbra/permissions.json — applied to all projects
Project rules: <project>/umbra.permissions.json — applied when projectPath matches

Rule matching supports exact tool name (shell.exec), namespace wildcard (shell.*), or global wildcard (*). Rules have an optional expiresAt ISO timestamp.

WorkspaceTrustManager tracks trusted directories in ~/.umbra/trusted-paths.json. fs.cd to a path not under any trusted ancestor always triggers an interactive prompt regardless of mode. Choosing “always” registers the path as trusted.

Interactive prompt (TTY only) shows the tool name and a human-readable action summary, then reads y / n / a. In non-TTY environments (piped stdin, CI, tests), the answer is automatically deny.

All decisions are appended to ~/.umbra/debug/permissions.jsonl for audit.

Model Catalogue & UI Normalization

ModelsRegistry resolves capabilities for any model ID at runtime:

models.dev/api.json — authoritative dataset; flattened to provider/model tuples; cached for 5 minutes in-process
HuggingFace model API — fallback for models not in models.dev
Heuristics — name-pattern rules (-vision, -instruct, -r1, o\d, context window from common model families)

Capabilities resolved: contextWindow, supportsTools, supportsVision, supportsReasoning, supportsStructuredOutput, supportsAttachments, supportsTemperature, longContext (>100 k), interleaved (reasoning_content vs reasoning_details), inputModalities, outputModalities, pricingPerMillion (input, output, cacheRead, cacheWrite).

Mistral model dedup in listModels() is multi-pass:

Filter non-chat and archived models
Exact-ID dedupe
Family grouping via iterative suffix stripping (-latest, -2604, -3.5, -3-5, -v2, -3) — handles compound suffixes like mistral-medium-3-5-2604 in two passes
Prefer -latest alias over specific version within same family
Score and sort by tier (Magistral → Large → Medium → Small → Devstral → Codestral → Ministral → Pixtral → Voxtral → Open → penalty for vibe-cli / labs- / deprecated)

TUI / Agent & Plan Modes

The TUI is built with Ink (React for terminal). Key surfaces:

Thread list — paginated, searchable, archivable; forking creates a new thread from the current session state
Session view — streaming assistant output with reasoning_delta and assistant_delta events shown separately; tool call / result panels; permission approval dialog
Provider selector — profile list with status badges (connected / available / unavailable); inline connect/test/delete
Memory panel — per-project memory text; toggle useMemories / generateMemories

Simple-chat detection in resolveRunModeContract(): prompts of ≤8 words matching a hardcoded greeting set (hi, hello, thanks, привет, спасибо, etc.) get toolPreset: null and no compression. This avoids sending tool schemas and compression overhead for conversational turns.

Mode selection at the CLI / API level:

umbra (no args) → opens TUI in agent mode
umbra exec "<task>" → headless exec mode (harness loop, 30-min timeout)
umbra task add "<task>" → queues a background agent run
HTTP API POST /run with mode: "plan" → structured JSON plan output

Markdown Visual Processing

src/cli/tui/markdown.ts renders markdown to ANSI escape sequences for terminal display. The renderer is line-by-line — no AST parsing — trading full CommonMark compliance for zero dependencies and instant output.

Block-level transforms (applied first):

Input	Output
`# Heading`	Bold + cyan text
`## Heading`	Bold + yellow text
`---` / `===`	Dim `────────────────────────────────────────`
`- item`	Cyan `-` bullet
`1. item`	Yellow numbered item
`> quote`	Dim `\| quote`

Inline transforms (applied after block transforms):

Syntax	Output
`code`	Yellow text
`bold`	Bold text
`italic`	Dim text
`[text](url)`	Cyan `text` + dim `(url)`

All transforms reset ANSI state ([0m) at the end of each match so adjacent styles don’t bleed across tokens.