Context Engine & AST Compression
Before every task, Umbra assembles a structured context window and sends it to the model. The engine has a fixed token budget (default 32,000 tokens, estimated at 4 chars/token) and fills it with ranked sections.
Context assembly pipeline
Section titled “Context assembly pipeline”buildTaskContext() ├─ buildRepoMap() # AST scan of the project tree → markdown outline ├─ buildSessionWindow() # compact old events, keep last 6 ├─ vector search # top-N semantically similar past memories ├─ loadHierarchicalInstructions() # rules files walked from global → local └─ summarizeTokenSections() # budget report per sectionRepo map & AST parsing
Section titled “Repo map & AST parsing”The repo map is a compact markdown outline of every significant file in the project. It lists imports and symbols (functions, classes, types, etc.) per file.
Three parsers run in priority order:
| Parser | Languages |
|---|---|
web-tree-sitter (full AST) | JavaScript, TypeScript, TSX, Python, Go, Bash, Rust, Java, CSS, Ruby, C#, PHP, PowerShell, C++ |
@bscotch/gml-parser (CST) | GameMaker Language (.gml) |
| Regex fallback | 40+ additional formats: JSON, YAML/GitHub Actions, Markdown, SQL, HTML, TOML, GraphQL, Protobuf, Terraform/HCL, Prisma, Solidity, Zig, Dart, Kotlin, Swift, Lua, Scala, Elixir, Erlang, Haskell, Perl, R, Clojure, Vue, Svelte, Astro, XML, Gradle, GDScript, MATLAB, Nix, Jupyter, WAT/Wasm, Assembly, Dockerfile, Makefile, CMake, .env, lockfiles |
Supported languages
Section titled “Supported languages”Full AST (via web-tree-sitter WASM grammars):
- JavaScript (
.js,.jsx,.cjs,.mjs) —tree-sitter-javascript.wasm - TypeScript (
.ts) —tree-sitter-typescript.wasm - TSX (
.tsx) —tree-sitter-tsx.wasm - Python (
.py) —tree-sitter-python.wasm - Go (
.go) —tree-sitter-go.wasm - Shell / Bash (
.sh,.bash,.zsh) —tree-sitter-bash.wasm - Rust (
.rs) —tree-sitter-rust.wasm - Java (
.java) —tree-sitter-java.wasm - C / C++ (
.c,.h,.cpp,.cc,.cxx,.hpp) —tree-sitter-cpp.wasm - C# (
.cs) —tree-sitter-c-sharp.wasm— class, interface, struct, record, delegate, enum, namespace, method, constructor, destructor, property, field, event, operator - PHP (
.php) —tree-sitter-php.wasm - Ruby (
.rb) —tree-sitter-ruby.wasm - CSS (
.css) —tree-sitter-css.wasm - PowerShell (
.ps1,.psm1) —tree-sitter-powershell.wasm - INI / Config (
.ini,.cfg) —tree-sitter-ini.wasm
Full AST (via dedicated parsers):
- GML / GameMaker 2.3+ (
.gml) —@bscotch/gml-parserCST; symbols: function, constructor, macro, enum, globalvar
Partial structured parsers (regex / DOM):
- JSON (
.json) —JSON.parse; top-level keys as symbols - YAML (
.yml,.yaml) —js-yaml; top-level keys as symbols - GitHub Actions / CI YAML — domain-aware layer on top of YAML; symbols: workflow name, triggers, jobs;
uses:→ imports - Markdown (
.md,.mdx) — heading extractor H1–H4 - SQL (
.sql) —CREATE TABLE/VIEW/FUNCTION/PROCEDURE/INDEX/TRIGGER - HTML (
.html,.htm) — ids, landmarks, role attributes, script/style blocks - TOML (
.toml) — sections[table], top-level keys - GraphQL (
.graphql,.gql) — type, interface, enum, union, input, scalar, query, mutation, subscription, fragment, directive - Protocol Buffers (
.proto) — message, service, enum, rpc, oneof; imports - Terraform / HCL (
.tf,.tfvars,.hcl) — resource, data, module, variable, output, provider, locals - Prisma (
.prisma) — model, enum, type, datasource, generator - Solidity (
.sol) — contract, interface, library, function, event, struct, enum, modifier, error; imports - Zig (
.zig) — fn, const struct/enum/union, var - Dart / Flutter (
.dart) — class, mixin, extension, enum, functions; imports - Kotlin (
.kt,.kts) — class, interface, object, fun, typealias, enum - Swift (
.swift) — class, struct, protocol, enum, extension, func, actor, typealias - Lua (
.lua) — functions, module tables, local requires - Scala (
.scala,.sc) — class, object, trait, def, val, given; imports - Elixir (
.ex,.exs) — defmodule, def, defp, defmacro, defprotocol, defimpl - Erlang (
.erl,.hrl) — module, functions;-exportlists - Haskell (
.hs,.lhs) — module, data, newtype, type, class, instance, functions; imports - Perl (
.pl,.pm) — package, sub;useimports - R (
.r,.R) — functions, R6/RefClass classes;library/requireimports - Clojure (
.clj,.cljs,.cljc) — ns, defn, def, defmacro, defprotocol, defrecord, defmulti - Vue (
.vue) — single-file component; component name + script-block exports - Svelte (
.svelte) — single-file component; component name + script-block exports - Astro (
.astro) — single-file component; component name + frontmatter symbols - XML (
.xml) — element tags, id attributes, name/key attributes - Gradle (
.gradle,.gradle.kts) — plugins, tasks, variables, dependencies - GDScript (
.gd) — Godot Engine; class_name, func, signal, enum, const, var, @export/@onready;extends→ import - MATLAB / Octave (
.m) — classdef, function (all signatures), section markers%%, properties/methods/events/enumeration;import/addpath→ imports - Nix (
.nix) — top-level attrs (col-0 bindings),mkDerivation/mkShell;import <nixpkgs>→ imports - WebAssembly Text (
.wat,.wast) —$funcnames,$global,$type, memory, table, data/elem segments;(import ...)/(export ...) - Assembly (
.asm,.s,.S,.nasm) — NASM/GAS/ARM; global labels, section markers, macros, constants,.type @function;extern→ imports - Dockerfile (
Dockerfile,.dockerfile) — FROM stages, EXPOSE ports, ARG, ENV, ENTRYPOINT, CMD - Makefile (
Makefile,GNUmakefile,.mk) — targets, uppercase variables;includeimports - CMake (
CMakeLists.txt,.cmake) — add_executable, add_library, function, macro, project, option, set - Env files (
.env,.env.*) — keys as symbols, values redacted as***REDACTED*** - Log files (
.log) — ERROR/FATAL/WARN lines as symbols - Jupyter (
.ipynb) — kernel name, markdown headings H1–H4, def/class from code cells;import/from→ imports - PDF (
.pdf) —pdf-parsetext extractor; heading heuristic from extracted text - DOCX (
.docx) —fflateunzip +word/document.xmlparser;<w:pStyle Heading>detection - yarn.lock — package name + resolved version
- Cargo.lock — crate name + version
- Gemfile.lock — gem name + version (SPECS section)
- composer.lock — PHP package name + version (JSON)
The repo map is cached in-process for 15 seconds — rapid consecutive tasks within the same daemon process reuse the cached scan.
Session compaction
Section titled “Session compaction”Long sessions are compressed automatically. When buildSessionWindow() finds a past session_compacted event, it:
- Uses the stored summary as the session history
- Keeps only events that occurred after the last compaction (up to 6)
On explicit compaction (compactSessionEvents()), the engine distills older events into a structured summary (goals, progress, files touched, failures, preserved tail) and emits a session_compacted event. Iterative compaction builds a rolling # Session Update on top of the previous summary rather than starting over.
Instruction file hierarchy
Section titled “Instruction file hierarchy”Agent rules are loaded from instruction files discovered by walking the directory tree. Priority order (highest last = wins):
~/.umbra/UMBRA.md (global)~/.umbra/AGENTS.md (global fallback) ↓ ancestor dirs (root → project parent) ↓ project dir UMBRA.md > AGENTS.md > CLAUDE.md > CODEX.md > GEMINI.md > QWEN.md > SYSTEM.mdLocal rules override global ones because they appear last in the merged string.
Token budget
Section titled “Token budget”Each section is measured independently:
| Section | Content |
|---|---|
task | The user’s task description |
agents | Merged instruction files |
memory | Long-term memory text |
repo_map | AST-generated project outline |
similar_memories | Vector-retrieved past context |
session_summary | Compacted session history |
recent_events | Last N raw session events |
The budget report (withinBudget, remainingTokens) is returned with every context build so the caller can decide whether to trim sections.