CodeAgent Indexing Engine + MCP SERVER

In short: a structured code index for LLM agents — symbol graph, text search, semantic search, all in one SQLite file, served over MCP.

A local code indexing and retrieval engine for C#, TypeScript/React, and Rust codebases, written in Rust. Parses source into a symbol graph (nodes, edges, spans), embeds symbols for semantic search, stores everything in a single SQLite file, and exposes it over MCP so LLM agents and tools can navigate large projects without loading them into context.

The engine combines tree-sitter parsing with compiler-grade analysis (Roslyn for C#, TypeScript Language Service, rust-analyzer for Rust) to build a full symbol graph — classes, methods, call chains, inheritance hierarchies, interface implementations — then makes it searchable by keyword, qualified name, or semantic similarity. It tracks file changes incrementally, detects renames across edits, and keeps the index current without full rebuilds. Everything runs locally in a single SQLite file; nothing leaves your machine.

What it does

Parses C#, TypeScript/TSX, and Rust files using tree-sitter into a typed symbol graph (classes, interfaces, methods, properties, components, modules, traits, etc.)
Enriches symbols with compiler-grade analysis via Roslyn (C#), TypeScript Language Service, and rust-analyzer (Rust), adding resolved call graphs, inheritance, and implementations
Detects renames across edits using git history + token-level fingerprinting (Jaccard similarity)
Embeds symbols using LateOn-Code-edge (ColBERT multi-vector, 48-dim per token, ONNX, in-process) for vector similarity search
Watches the file system and incrementally re-indexes changed files
Detects dead code — finds unused symbols (methods, classes, properties) with no incoming calls, references, or implementations
Integrates with Claude Code via 4 lifecycle hooks: context-aware compaction, automatic re-indexing on file edits, subagent orientation, and post-task quality reports
Serves 18 tools over MCP (stdio transport) for symbol lookup, graph traversal, full-text search, semantic similarity search, dead code detection, file browsing, and more

Architecture

codeagent-engine/
  crates/
    codeagent-core/       Core library — parsing, graph, storage, retrieval
    codeagent-cli/        Debug CLI (codeagent binary)
    codeagent-mcp/        MCP server (codeagent-mcp binary)
  extractors/
    csharp/               .NET 8 / .NET 10 Roslyn extractor (JSON-RPC over stdio)
    typescript/           Node.js TS Language Service extractor (JSON-RPC over stdio)
    rust/                 Rust extractor — LSP adapter wrapping rust-analyzer (JSON-RPC over stdio)

Storage

Single SQLite file per project. WAL mode, single-writer (dedicated OS thread + mpsc channel), reader pool (r2d2). Schema includes:

Table	Purpose
`nodes`	symbols with identity keys, metadata, and content hashes
`edges`	typed relationships (calls, inherits, implements, contains, imports, ...)
`node_spans`	source locations with line ranges and span hashes
`fts_nodes`	FTS5 full-text index over symbol names and signatures
`vec_nodes`	embedding vectors for similarity search
`deletion_log`	journal for hard deletes and rename detection

UUIDs stored as BLOB(16), content hashes as BLOB(32).

Ingest pipeline

File changes flow through the pipeline:

Project detection (find .csproj / package.json / tsconfig.json / Cargo.toml)
Solution prebuild (C# only: generate synthetic .sln, dotnet restore, load Roslyn workspace)
Rename detection (git + fingerprint + symbol-level matching)
Semantic pre-analysis (all IPC before tree-sitter: Roslyn + TS Language Service provide final symbol keys so extraction avoids identity reconciliation)
Syntactic parsing (tree-sitter adapters for C#, TypeScript, and Rust, parallelised via Rayon, using semantic keys from step 4)
Apply semantic edges + attributes (DB writes only, no IPC)
Deletions (hard-delete removed files, journaled)
Semantic context changes (recompute edges when .csproj / tsconfig.json / Cargo.toml changes)

For incremental batches after the initial index, the IPC processes are shut down to reclaim memory. When a subsequent large batch needs semantic analysis, a minimal solution containing only the touched projects is generated and loaded — avoiding the cost of re-loading the full workspace.

Retrieval

Hybrid search combines three channels — semantic similarity (ColBERT two-stage: centroid pre-filter + MaxSim re-rank), keyword matching (BM25 via FTS5), and qualified-name lookup — then merges results via Reciprocal Rank Fusion with configurable boosts for public API surface and reference counts.

MCP tools

The server exposes 18 tools over stdio:

Category	Tools
File system	`list_directory`, `read_file`, `get_directory_tree`
Search	`search_symbols` (keyword), `lookup_symbol` (qualified name), `find_similar` (semantic)
Navigation	`get_symbol`, `get_source_spans`, `get_file_outline`, `get_callers`, `get_callees`, `get_implementations`, `get_references`, `get_dependencies`, `get_dependents`, `find_dead_code`
Management	`index_files`, `get_status`

All file access is sandboxed to the repository root.

Building

Requires Rust 1.70+ and Cargo.

cd codeagent-engine
cargo build --release

The two binaries end up in target/release/:

codeagent — debug CLI
codeagent-mcp — MCP server

Optional: language extractors

For semantic enrichment beyond tree-sitter (resolved types, call graphs):

C# (Roslyn) — requires .NET 8 SDK or .NET 10 SDK (or both):

cd extractors/csharp
dotnet build -c Release

The project multi-targets net8.0 and net10.0. Building produces output under both bin/Release/net8.0/ and bin/Release/net10.0/. The engine auto-detects which .NET runtimes are installed and selects the best matching binary at launch.

TypeScript — requires Node.js 18+:

cd extractors/typescript
npm install && npm run build

Rust — requires Rust toolchain and rust-analyzer:

rustup component add rust-analyzer
cd extractors/rust
cargo build --release

Configure extractor paths in .codeagent/config.json:

{
  "indexing": {
    "csharp_extractor_path": "path/to/bin/Release/net8.0/CodeAgentExtractor.dll",
    "typescript_extractor_path": "path/to/dist/index.js",
    "rust_extractor_path": "path/to/extractors/rust/target/release/codeagent-rust-extractor"
  }
}

For the C# extractor, point csharp_extractor_path at any TFM-specific DLL (e.g., the net8.0/ copy). The engine will automatically check sibling TFM directories and pick the one matching your installed .NET runtime — so the same config works whether you have .NET 8, .NET 10, or both installed.

Without extractors, indexing falls back to syntactic-only mode (tree-sitter). You still get symbols, containment, and imports — just not resolved call graphs or interface implementations.

Getting started

cd your-project
codeagent init

This creates .codeagent/ (config, database), adds the DB to .gitignore, and registers 4 Claude Code lifecycle hooks in .claude/settings.json. Then start the MCP server:

codeagent-mcp

Configuration

codeagent init creates .codeagent/config.json with sensible defaults. All fields are optional.

Example config

{
  "indexing": {
    "safe_mode": true,
    "write_debounce_ms": 2000,
    "rename_similarity_threshold": 0.80,
    "follow_symlinks": false
  },
  "embedding": {
    "model_name": "lightonai/LateOn-Code-edge",
    "dimensionality": 48,
    "batch_size": 64,
    "prefilter_k": 100
  },
  "retrieval": {
    "max_output_tokens": 16384,
    "rrf_k": 60
  },
  "mcp": {
    "max_results": 50,
    "max_file_size": 524288
  }
}

Environment variable overrides follow the pattern CODEAGENT_<SECTION>_<KEY> (e.g., CODEAGENT_INDEXING_SAFE_MODE=false).

Claude Code hooks

codeagent init registers four hooks that run automatically during Claude Code sessions:

Hook	Trigger	What it does
PreCompact	Before context compaction	Injects a PageRank-ranked table of the 30 most central symbols so they survive compaction
PostToolUse	After Edit / Write / NotebookEdit	Silently re-indexes the changed file so the graph stays current
SubagentStart	When a subagent spawns	Provides a project overview (stats, top 15 symbols, available MCP tools)
TaskCompleted	When a task finishes	Reports potentially unused symbols (dead code) and unresolved references

Hooks communicate over stdin/stdout JSON. Non-blocking errors are logged to stderr; hooks never block Claude Code execution.

CLI

The codeagent binary provides project setup, hook handling, and database inspection:

# Set up a project
codeagent init [--repo-root <path>]

# Database inspection
codeagent --db .codeagent/index.db get-node <uuid>
codeagent --db .codeagent/index.db get-outline <file-id>
codeagent --db .codeagent/index.db filter --query "authenticate" --node-type method
codeagent --db .codeagent/index.db lookup "MyApp.Auth.AuthService"
codeagent --db .codeagent/index.db health

# Hook handlers (called automatically by Claude Code, not manually)
codeagent hook pre-compact
codeagent hook post-tool-use
codeagent hook subagent-start
codeagent hook task-completed

Tests

# Unit and integration tests (417 core + 41 fixture + 31 MCP + 5 CLI)
cargo test

# Rust extractor tests (separate binary, outside workspace)
cd extractors/rust && cargo test

# OSS integration tests — indexes real repos (tRPC, Hot Chocolate GraphQL, rust-analyzer)
# First run clones repos; subsequent runs use cached clones
cargo test -p codeagent-core --features oss-tests --test oss_tests -- --nocapture

# Pipeline benchmarks only (feature-gated)
cargo test -p codeagent-core --features oss-tests test_oss_hc_pipeline_benchmark -- --exact --nocapture
cargo test -p codeagent-core --features oss-tests test_oss_trpc_pipeline_benchmark -- --exact --nocapture
cargo test -p codeagent-core --features oss-tests test_oss_rust_analyzer_pipeline_benchmark -- --exact --nocapture

494 workspace tests plus 27 Rust extractor tests plus 31 OSS integration tests (feature-gated behind --features oss-tests), covering parsing, graph operations, invalidation, rename detection, retrieval, dead code detection, PageRank, MCP tool behavior, CLI init/hooks, end-to-end pipeline benchmarks (initial index, idempotent reindex, touched-file reindex with minimal solution reload), and query benchmarks against real-world codebases (tRPC for TypeScript, Hot Chocolate for C#, rust-analyzer for Rust).

Graph model


Node types	File, Module, Project, Class, Interface, Method, Property, Constructor, Type, Component
Edge types	Calls, Inherits, Implements, Imports, Overrides, References, Contains, Accepts, Extends
Languages	C# (`csharp`), TypeScript (`typescript`), Rust (`rust`)

Symbol identity is stable across edits. The identity key is (language, project_id, symbol_key, symbol_disambiguator) — overload-safe for C# (includes parameter types) and file-scoped for TypeScript (includes a deterministic file ID derived from the path).

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
Implementation Plan		Implementation Plan
codeagent-engine		codeagent-engine
scripts		scripts
.gitignore		.gitignore
ASPNET_CORE_API_BOUNDARY_PLAN.md		ASPNET_CORE_API_BOUNDARY_PLAN.md
CLAUDE.md		CLAUDE.md
IMPLEMENTATION_PLAN.md		IMPLEMENTATION_PLAN.md
INVARIANTS_CHECKLIST.md		INVARIANTS_CHECKLIST.md
MCP_SERVER_SPEC.md		MCP_SERVER_SPEC.md
MEMORY.md		MEMORY.md
README.md		README.md
TESTS_IMPLEMENTATION_PLAN.md		TESTS_IMPLEMENTATION_PLAN.md
TEST_COVERAGE.md		TEST_COVERAGE.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeAgent Indexing Engine + MCP SERVER

What it does

Architecture

Storage

Ingest pipeline

Retrieval

MCP tools

Building

Getting started

Configuration

Claude Code hooks

CLI

Tests

Graph model

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CodeAgent Indexing Engine + MCP SERVER

What it does

Architecture

Storage

Ingest pipeline

Retrieval

MCP tools

Building

Getting started

Configuration

Claude Code hooks

CLI

Tests

Graph model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages