How Umbra manages context

Every request passes through a stack of automatic mechanisms before a single token is sent to the model:

Repo map — instead of dumping raw files, Umbra builds a symbol-level AST outline of the entire project (functions, classes, types, imports) across 40+ languages. A 500-file codebase becomes a compact markdown index of ~5–20 KB. Cached for 15 seconds — no re-parsing on rapid consecutive tasks.

Retrieval packets — when the agent searches code with search.rg or search.files, results are compressed into a ranked, token-bounded packet before they reach the model. Raw ripgrep output of 15 KB becomes a 1–2 KB packet. Triggered automatically when the tool output exceeds 1,500 tokens.

5-level compression — tool outputs (shell stdout, diffs, logs) are intelligently truncated with head + tail preservation. Critical lines — Error:, TypeError:, stack traces, exit codes — are always rescued even from truncated sections. Levels range from lite (500 max lines) to ultra (20 max lines).

Split-turn — if the agent makes many tool calls in one turn and the message window approaches the limit, earlier tool pairs are compressed into a summary in-place while the last 3 pairs stay raw. The model never loses recent context.

Sliding window — the message history is trimmed from the oldest end when the payload budget (80,000 tokens) is approached. The system message and current turn are always preserved.

Session compaction — long sessions are summarized on demand (/compact) or automatically. A 50+ event session becomes one structured summary (goals, progress, files touched, failures) plus the last 6 raw events. Savings: 70–90% on accumulated session history.

Vector memory — past sessions are stored as embeddings. On each task, only the top 5 semantically similar past memories are injected, bounded to ~2,500 tokens.

Mode-based budgets — the context budget and compression level adapt to the task:

Mode	Context budget	Compression
`agent` / `plan`	32,000 tokens	standard
`exec` (harness loop)	32,000 tokens	aggressive
`full`	128,000 tokens	off

The result: Umbra can work on large codebases, run multi-hour sessions, and iterate through dozens of tool calls — all within the token window your provider gives you, without you managing any of it manually.