Skip to content

Context Engine & AST Compression

Before every task, Umbra assembles a structured context window and sends it to the model. The engine has a fixed token budget (default 32,000 tokens, estimated at 4 chars/token) and fills it with ranked sections.

buildTaskContext()
├─ buildRepoMap() # AST scan of the project tree → markdown outline
├─ buildSessionWindow() # compact old events, keep last 6
├─ vector search # top-N semantically similar past memories
├─ loadHierarchicalInstructions() # rules files walked from global → local
└─ summarizeTokenSections() # budget report per section

The repo map is a compact markdown outline of every significant file in the project. It lists imports and symbols (functions, classes, types, etc.) per file.

Three parsers run in priority order:

ParserLanguages
web-tree-sitter (full AST)JavaScript, TypeScript, TSX, Python, Go, Bash, Rust, Java, CSS, Ruby, C#, PHP, PowerShell, C++
@bscotch/gml-parser (CST)GameMaker Language (.gml)
Regex fallback40+ additional formats: JSON, YAML/GitHub Actions, Markdown, SQL, HTML, TOML, GraphQL, Protobuf, Terraform/HCL, Prisma, Solidity, Zig, Dart, Kotlin, Swift, Lua, Scala, Elixir, Erlang, Haskell, Perl, R, Clojure, Vue, Svelte, Astro, XML, Gradle, GDScript, MATLAB, Nix, Jupyter, WAT/Wasm, Assembly, Dockerfile, Makefile, CMake, .env, lockfiles

Full AST (via web-tree-sitter WASM grammars):

  • JavaScript (.js, .jsx, .cjs, .mjs) — tree-sitter-javascript.wasm
  • TypeScript (.ts) — tree-sitter-typescript.wasm
  • TSX (.tsx) — tree-sitter-tsx.wasm
  • Python (.py) — tree-sitter-python.wasm
  • Go (.go) — tree-sitter-go.wasm
  • Shell / Bash (.sh, .bash, .zsh) — tree-sitter-bash.wasm
  • Rust (.rs) — tree-sitter-rust.wasm
  • Java (.java) — tree-sitter-java.wasm
  • C / C++ (.c, .h, .cpp, .cc, .cxx, .hpp) — tree-sitter-cpp.wasm
  • C# (.cs) — tree-sitter-c-sharp.wasm — class, interface, struct, record, delegate, enum, namespace, method, constructor, destructor, property, field, event, operator
  • PHP (.php) — tree-sitter-php.wasm
  • Ruby (.rb) — tree-sitter-ruby.wasm
  • CSS (.css) — tree-sitter-css.wasm
  • PowerShell (.ps1, .psm1) — tree-sitter-powershell.wasm
  • INI / Config (.ini, .cfg) — tree-sitter-ini.wasm

Full AST (via dedicated parsers):

  • GML / GameMaker 2.3+ (.gml) — @bscotch/gml-parser CST; symbols: function, constructor, macro, enum, globalvar

Partial structured parsers (regex / DOM):

  • JSON (.json) — JSON.parse; top-level keys as symbols
  • YAML (.yml, .yaml) — js-yaml; top-level keys as symbols
  • GitHub Actions / CI YAML — domain-aware layer on top of YAML; symbols: workflow name, triggers, jobs; uses: → imports
  • Markdown (.md, .mdx) — heading extractor H1–H4
  • SQL (.sql) — CREATE TABLE/VIEW/FUNCTION/PROCEDURE/INDEX/TRIGGER
  • HTML (.html, .htm) — ids, landmarks, role attributes, script/style blocks
  • TOML (.toml) — sections [table], top-level keys
  • GraphQL (.graphql, .gql) — type, interface, enum, union, input, scalar, query, mutation, subscription, fragment, directive
  • Protocol Buffers (.proto) — message, service, enum, rpc, oneof; imports
  • Terraform / HCL (.tf, .tfvars, .hcl) — resource, data, module, variable, output, provider, locals
  • Prisma (.prisma) — model, enum, type, datasource, generator
  • Solidity (.sol) — contract, interface, library, function, event, struct, enum, modifier, error; imports
  • Zig (.zig) — fn, const struct/enum/union, var
  • Dart / Flutter (.dart) — class, mixin, extension, enum, functions; imports
  • Kotlin (.kt, .kts) — class, interface, object, fun, typealias, enum
  • Swift (.swift) — class, struct, protocol, enum, extension, func, actor, typealias
  • Lua (.lua) — functions, module tables, local requires
  • Scala (.scala, .sc) — class, object, trait, def, val, given; imports
  • Elixir (.ex, .exs) — defmodule, def, defp, defmacro, defprotocol, defimpl
  • Erlang (.erl, .hrl) — module, functions; -export lists
  • Haskell (.hs, .lhs) — module, data, newtype, type, class, instance, functions; imports
  • Perl (.pl, .pm) — package, sub; use imports
  • R (.r, .R) — functions, R6/RefClass classes; library/require imports
  • Clojure (.clj, .cljs, .cljc) — ns, defn, def, defmacro, defprotocol, defrecord, defmulti
  • Vue (.vue) — single-file component; component name + script-block exports
  • Svelte (.svelte) — single-file component; component name + script-block exports
  • Astro (.astro) — single-file component; component name + frontmatter symbols
  • XML (.xml) — element tags, id attributes, name/key attributes
  • Gradle (.gradle, .gradle.kts) — plugins, tasks, variables, dependencies
  • GDScript (.gd) — Godot Engine; class_name, func, signal, enum, const, var, @export/@onready; extends → import
  • MATLAB / Octave (.m) — classdef, function (all signatures), section markers %%, properties/methods/events/enumeration; import/addpath → imports
  • Nix (.nix) — top-level attrs (col-0 bindings), mkDerivation/mkShell; import <nixpkgs> → imports
  • WebAssembly Text (.wat, .wast) — $func names, $global, $type, memory, table, data/elem segments; (import ...) / (export ...)
  • Assembly (.asm, .s, .S, .nasm) — NASM/GAS/ARM; global labels, section markers, macros, constants, .type @function; extern → imports
  • Dockerfile (Dockerfile, .dockerfile) — FROM stages, EXPOSE ports, ARG, ENV, ENTRYPOINT, CMD
  • Makefile (Makefile, GNUmakefile, .mk) — targets, uppercase variables; include imports
  • CMake (CMakeLists.txt, .cmake) — add_executable, add_library, function, macro, project, option, set
  • Env files (.env, .env.*) — keys as symbols, values redacted as ***REDACTED***
  • Log files (.log) — ERROR/FATAL/WARN lines as symbols
  • Jupyter (.ipynb) — kernel name, markdown headings H1–H4, def/class from code cells; import/from → imports
  • PDF (.pdf) — pdf-parse text extractor; heading heuristic from extracted text
  • DOCX (.docx) — fflate unzip + word/document.xml parser; <w:pStyle Heading> detection
  • yarn.lock — package name + resolved version
  • Cargo.lock — crate name + version
  • Gemfile.lock — gem name + version (SPECS section)
  • composer.lock — PHP package name + version (JSON)

The repo map is cached in-process for 15 seconds — rapid consecutive tasks within the same daemon process reuse the cached scan.

Long sessions are compressed automatically. When buildSessionWindow() finds a past session_compacted event, it:

  1. Uses the stored summary as the session history
  2. Keeps only events that occurred after the last compaction (up to 6)

On explicit compaction (compactSessionEvents()), the engine distills older events into a structured summary (goals, progress, files touched, failures, preserved tail) and emits a session_compacted event. Iterative compaction builds a rolling # Session Update on top of the previous summary rather than starting over.

Agent rules are loaded from instruction files discovered by walking the directory tree. Priority order (highest last = wins):

~/.umbra/UMBRA.md (global)
~/.umbra/AGENTS.md (global fallback)
↓ ancestor dirs (root → project parent)
↓ project dir UMBRA.md > AGENTS.md > CLAUDE.md > CODEX.md > GEMINI.md > QWEN.md > SYSTEM.md

Local rules override global ones because they appear last in the merged string.

Each section is measured independently:

SectionContent
taskThe user’s task description
agentsMerged instruction files
memoryLong-term memory text
repo_mapAST-generated project outline
similar_memoriesVector-retrieved past context
session_summaryCompacted session history
recent_eventsLast N raw session events

The budget report (withinBudget, remainingTokens) is returned with every context build so the caller can decide whether to trim sections.