In short: a structured code index for LLM agents — symbol graph, text search, semantic search, all in one SQLite file, served over MCP.
A local code indexing and retrieval engine for C#, TypeScript/React, and Rust codebases, written in Rust. Parses source into a symbol graph (nodes, edges, spans), embeds symbols for semantic search, stores everything in a single SQLite file, and exposes it over MCP so LLM agents and tools can navigate large projects without loading them into context.
The engine combines tree-sitter parsing with compiler-grade analysis (Roslyn for C#, TypeScript Language Service, rust-analyzer for Rust) to build a full symbol graph — classes, methods, call chains, inheritance hierarchies, interface implementations — then makes it searchable by keyword, qualified name, or semantic similarity. It tracks file changes incrementally, detects renames across edits, and keeps the index current without full rebuilds. Everything runs locally in a single SQLite file; nothing leaves your machine.
- Parses C#, TypeScript/TSX, and Rust files using tree-sitter into a typed symbol graph (classes, interfaces, methods, properties, components, modules, traits, etc.)
- Enriches symbols with compiler-grade analysis via Roslyn (C#), TypeScript Language Service, and rust-analyzer (Rust), adding resolved call graphs, inheritance, and implementations
- Detects renames across edits using git history + token-level fingerprinting (Jaccard similarity)
- Embeds symbols using LateOn-Code-edge (ColBERT multi-vector, 48-dim per token, ONNX, in-process) for vector similarity search
- Watches the file system and incrementally re-indexes changed files
- Detects dead code — finds unused symbols (methods, classes, properties) with no incoming calls, references, or implementations
- Integrates with Claude Code via 4 lifecycle hooks: context-aware compaction, automatic re-indexing on file edits, subagent orientation, and post-task quality reports
- Serves 18 tools over MCP (stdio transport) for symbol lookup, graph traversal, full-text search, semantic similarity search, dead code detection, file browsing, and more
codeagent-engine/
crates/
codeagent-core/ Core library — parsing, graph, storage, retrieval
codeagent-cli/ Debug CLI (codeagent binary)
codeagent-mcp/ MCP server (codeagent-mcp binary)
extractors/
csharp/ .NET 8 / .NET 10 Roslyn extractor (JSON-RPC over stdio)
typescript/ Node.js TS Language Service extractor (JSON-RPC over stdio)
rust/ Rust extractor — LSP adapter wrapping rust-analyzer (JSON-RPC over stdio)
Single SQLite file per project. WAL mode, single-writer (dedicated OS thread + mpsc channel), reader pool (r2d2). Schema includes:
| Table | Purpose |
|---|---|
nodes |
symbols with identity keys, metadata, and content hashes |
edges |
typed relationships (calls, inherits, implements, contains, imports, ...) |
node_spans |
source locations with line ranges and span hashes |
fts_nodes |
FTS5 full-text index over symbol names and signatures |
vec_nodes |
embedding vectors for similarity search |
deletion_log |
journal for hard deletes and rename detection |
UUIDs stored as BLOB(16), content hashes as BLOB(32).
File changes flow through the pipeline:
- Project detection (find .csproj / package.json / tsconfig.json / Cargo.toml)
- Solution prebuild (C# only: generate synthetic .sln,
dotnet restore, load Roslyn workspace) - Rename detection (git + fingerprint + symbol-level matching)
- Semantic pre-analysis (all IPC before tree-sitter: Roslyn + TS Language Service provide final symbol keys so extraction avoids identity reconciliation)
- Syntactic parsing (tree-sitter adapters for C#, TypeScript, and Rust, parallelised via Rayon, using semantic keys from step 4)
- Apply semantic edges + attributes (DB writes only, no IPC)
- Deletions (hard-delete removed files, journaled)
- Semantic context changes (recompute edges when .csproj / tsconfig.json / Cargo.toml changes)
For incremental batches after the initial index, the IPC processes are shut down to reclaim memory. When a subsequent large batch needs semantic analysis, a minimal solution containing only the touched projects is generated and loaded — avoiding the cost of re-loading the full workspace.
Hybrid search combines three channels — semantic similarity (ColBERT two-stage: centroid pre-filter + MaxSim re-rank), keyword matching (BM25 via FTS5), and qualified-name lookup — then merges results via Reciprocal Rank Fusion with configurable boosts for public API surface and reference counts.
The server exposes 18 tools over stdio:
| Category | Tools |
|---|---|
| File system | list_directory, read_file, get_directory_tree |
| Search | search_symbols (keyword), lookup_symbol (qualified name), find_similar (semantic) |
| Navigation | get_symbol, get_source_spans, get_file_outline, get_callers, get_callees, get_implementations, get_references, get_dependencies, get_dependents, find_dead_code |
| Management | index_files, get_status |
All file access is sandboxed to the repository root.
Requires Rust 1.70+ and Cargo.
cd codeagent-engine
cargo build --releaseThe two binaries end up in target/release/:
codeagent— debug CLIcodeagent-mcp— MCP server
Optional: language extractors
For semantic enrichment beyond tree-sitter (resolved types, call graphs):
C# (Roslyn) — requires .NET 8 SDK or .NET 10 SDK (or both):
cd extractors/csharp
dotnet build -c ReleaseThe project multi-targets net8.0 and net10.0. Building produces output under both bin/Release/net8.0/ and bin/Release/net10.0/. The engine auto-detects which .NET runtimes are installed and selects the best matching binary at launch.
TypeScript — requires Node.js 18+:
cd extractors/typescript
npm install && npm run buildRust — requires Rust toolchain and rust-analyzer:
rustup component add rust-analyzer
cd extractors/rust
cargo build --releaseConfigure extractor paths in .codeagent/config.json:
{
"indexing": {
"csharp_extractor_path": "path/to/bin/Release/net8.0/CodeAgentExtractor.dll",
"typescript_extractor_path": "path/to/dist/index.js",
"rust_extractor_path": "path/to/extractors/rust/target/release/codeagent-rust-extractor"
}
}For the C# extractor, point csharp_extractor_path at any TFM-specific DLL (e.g., the net8.0/ copy). The engine will automatically check sibling TFM directories and pick the one matching your installed .NET runtime — so the same config works whether you have .NET 8, .NET 10, or both installed.
Without extractors, indexing falls back to syntactic-only mode (tree-sitter). You still get symbols, containment, and imports — just not resolved call graphs or interface implementations.
cd your-project
codeagent initThis creates .codeagent/ (config, database), adds the DB to .gitignore, and registers 4 Claude Code lifecycle hooks in .claude/settings.json. Then start the MCP server:
codeagent-mcpcodeagent init creates .codeagent/config.json with sensible defaults. All fields are optional.
Example config
{
"indexing": {
"safe_mode": true,
"write_debounce_ms": 2000,
"rename_similarity_threshold": 0.80,
"follow_symlinks": false
},
"embedding": {
"model_name": "lightonai/LateOn-Code-edge",
"dimensionality": 48,
"batch_size": 64,
"prefilter_k": 100
},
"retrieval": {
"max_output_tokens": 16384,
"rrf_k": 60
},
"mcp": {
"max_results": 50,
"max_file_size": 524288
}
}Environment variable overrides follow the pattern CODEAGENT_<SECTION>_<KEY> (e.g., CODEAGENT_INDEXING_SAFE_MODE=false).
codeagent init registers four hooks that run automatically during Claude Code sessions:
| Hook | Trigger | What it does |
|---|---|---|
| PreCompact | Before context compaction | Injects a PageRank-ranked table of the 30 most central symbols so they survive compaction |
| PostToolUse | After Edit / Write / NotebookEdit | Silently re-indexes the changed file so the graph stays current |
| SubagentStart | When a subagent spawns | Provides a project overview (stats, top 15 symbols, available MCP tools) |
| TaskCompleted | When a task finishes | Reports potentially unused symbols (dead code) and unresolved references |
Hooks communicate over stdin/stdout JSON. Non-blocking errors are logged to stderr; hooks never block Claude Code execution.
The codeagent binary provides project setup, hook handling, and database inspection:
# Set up a project
codeagent init [--repo-root <path>]
# Database inspection
codeagent --db .codeagent/index.db get-node <uuid>
codeagent --db .codeagent/index.db get-outline <file-id>
codeagent --db .codeagent/index.db filter --query "authenticate" --node-type method
codeagent --db .codeagent/index.db lookup "MyApp.Auth.AuthService"
codeagent --db .codeagent/index.db health
# Hook handlers (called automatically by Claude Code, not manually)
codeagent hook pre-compact
codeagent hook post-tool-use
codeagent hook subagent-start
codeagent hook task-completed# Unit and integration tests (417 core + 41 fixture + 31 MCP + 5 CLI)
cargo test
# Rust extractor tests (separate binary, outside workspace)
cd extractors/rust && cargo test
# OSS integration tests — indexes real repos (tRPC, Hot Chocolate GraphQL, rust-analyzer)
# First run clones repos; subsequent runs use cached clones
cargo test -p codeagent-core --features oss-tests --test oss_tests -- --nocapture
# Pipeline benchmarks only (feature-gated)
cargo test -p codeagent-core --features oss-tests test_oss_hc_pipeline_benchmark -- --exact --nocapture
cargo test -p codeagent-core --features oss-tests test_oss_trpc_pipeline_benchmark -- --exact --nocapture
cargo test -p codeagent-core --features oss-tests test_oss_rust_analyzer_pipeline_benchmark -- --exact --nocapture494 workspace tests plus 27 Rust extractor tests plus 31 OSS integration tests (feature-gated behind --features oss-tests), covering parsing, graph operations, invalidation, rename detection, retrieval, dead code detection, PageRank, MCP tool behavior, CLI init/hooks, end-to-end pipeline benchmarks (initial index, idempotent reindex, touched-file reindex with minimal solution reload), and query benchmarks against real-world codebases (tRPC for TypeScript, Hot Chocolate for C#, rust-analyzer for Rust).
| Node types | File, Module, Project, Class, Interface, Method, Property, Constructor, Type, Component |
| Edge types | Calls, Inherits, Implements, Imports, Overrides, References, Contains, Accepts, Extends |
| Languages | C# (csharp), TypeScript (typescript), Rust (rust) |
Symbol identity is stable across edits. The identity key is (language, project_id, symbol_key, symbol_disambiguator) — overload-safe for C# (includes parameter types) and file-scoped for TypeScript (includes a deterministic file ID derived from the path).