Skip to content

Latest commit

 

History

History
526 lines (384 loc) · 16.1 KB

File metadata and controls

526 lines (384 loc) · 16.1 KB

MCP Server Specification — codeagent-index-engine

1. Overview

The codeagent-mcp server exposes the index engine and basic file system navigation as a single MCP (Model Context Protocol) endpoint. Any MCP-compatible client — LLM agent, desktop app, IDE extension, CLI — connects to the same server and uses the same tools.

Transport: stdio (primary) and SSE (for network-accessible scenarios).

Architecture:

┌─────────────────────────────────────────────────────┐
│                  MCP Clients                        │
│  (LLM agents, Tauri app, VS Code, CLI, etc.)        │
└──────────────────────┬──────────────────────────────┘
                       │  MCP (stdio or SSE)
                       ▼
              ┌─────────────────┐
              │  codeagent-mcp  │   ← thin Rust crate (new)
              │   (MCP server)  │
              └────────┬────────┘
                       │
          ┌────────────┼────────────┐
          ▼            ▼            ▼
   ┌────────────┐ ┌─────────┐ ┌─────────┐
   │ codeagent- │ │  std::fs │ │ Engine  │
   │   core     │ │ (sandboxed│ │ lifecycle│
   │ (queries)  │ │  to repo) │ │ (status)│
   └────────────┘ └─────────┘ └─────────┘

The MCP server is a thin wrapper. All indexing logic remains in codeagent-core. File system operations use std::fs scoped to the repository root. No business logic lives in the MCP layer.


2. Server Lifecycle

Event Behaviour
Startup Load config from .codeagent/config.json. Open SQLite writer + reader pool. Start file watcher. Register sqlite-vec extension. Run migrations.
Shutdown Cancel watcher. Drain write queue. Checkpoint WAL. Close connections.
Health Exposed via get_status tool (see §4.5).

The server process owns the single-writer SQLite connection. Multiple MCP clients can connect simultaneously; all reads go through the reader pool, all writes are serialised through the writer channel.


3. Tool Categories

Tools are grouped into four categories:

  1. File System — raw file/directory navigation (no engine involvement)
  2. Search & Discovery — finding symbols by text, name, or similarity
  3. Inspection & Navigation — examining symbols and traversing the graph
  4. Engine Management — indexing triggers and status

4. Tool Definitions

4.1 File System

list_directory

List files and directories at a path within the repository.

Parameter Type Required Description
path string No Repo-relative path. Defaults to repo root ("").

Returns: Array of entries, each with:

  • name (string) — file or directory name
  • type ("file" | "directory")
  • size (number) — file size in bytes (files only)

Constraints:

  • Path must resolve within the repo root (no .. escape).
  • Respects .gitignore rules.
  • Does not follow symlinks outside the repo root.

read_file

Read the contents of a file.

Parameter Type Required Description
path string Yes Repo-relative path to the file.
line_start number No 1-based start line (inclusive).
line_end number No 1-based end line (inclusive).

Returns:

  • content (string) — file contents (or line range)
  • total_lines (number) — total line count of the file
  • truncated (boolean) — true if content was capped

Constraints:

  • Path must resolve within the repo root.
  • Maximum output: 10,000 lines or 500 KB (whichever is smaller). Returns truncated: true if capped.
  • Binary files return an error with the detected MIME type.

get_directory_tree

Recursive directory structure.

Parameter Type Required Description
path string No Repo-relative root. Defaults to "".
depth number No Max recursion depth. Default 3, max 10.

Returns: Nested tree structure:

{
  "name": "src",
  "type": "directory",
  "children": [
    { "name": "main.rs", "type": "file" },
    { "name": "lib", "type": "directory", "children": [...] }
  ]
}

Constraints:

  • Respects .gitignore.
  • Directories with >1,000 entries return the first 1,000 with a truncated: true flag.

4.2 Search & Discovery

search_symbols

Full-text search across all indexed symbols using FTS5 BM25 ranking.

Parameter Type Required Description
query string Yes FTS5 query string (e.g. "Authenticate*", "UserService")
project_id string No Scope to a specific project
node_type string No Filter by type: class, method, interface, property, component, file, module, type, constructor
language string No Filter: csharp or typescript
limit number No Max results (default 20, max 50)

Returns: Array of results ranked by BM25 relevance:

{
  "node_id": "...",
  "name": "Authenticate",
  "qualified_name": "MyApp.Auth.AuthService.Authenticate",
  "node_type": "method",
  "language": "csharp",
  "file_path": "src/Auth/AuthService.cs",
  "line_start": 42,
  "line_end": 58,
  "access_modifier": "public",
  "parameter_signature": "(string username, string password)",
  "return_type": "Task<AuthResult>",
  "rank": -8.32
}

Backed by: query::filter_nodes with fts_query.


lookup_symbol

Find symbol(s) by exact qualified name. May return multiple results (overloads, partial classes).

Parameter Type Required Description
qualified_name string Yes Exact qualified name (e.g. "MyApp.Auth.AuthService.Authenticate")
language string No Filter: csharp or typescript
project_id string No Scope to a specific project

Returns: Array of matching nodes (same shape as search_symbols results).

Backed by: query::get_node_by_qualified_name.


find_similar

Find symbols semantically similar to a given symbol using embedding similarity.

Parameter Type Required Description
node_id string Yes The reference symbol's node ID
limit number No Max results (default 10, max 50)

Returns: Array of nodes with similarity scores.

Status: Deferred until Phase 4 ANN search is implemented. Currently, vec_nodes uses a regular table (brute-force scan), which is acceptable for small codebases but not production-ready.

Backed by: Embedding lookup + cosine similarity over vec_nodes.


4.3 Inspection & Navigation

get_symbol

Get full metadata for a single symbol by ID.

Parameter Type Required Description
node_id string Yes The symbol's node ID

Returns: Complete node metadata:

{
  "node_id": "...",
  "name": "Authenticate",
  "qualified_name": "MyApp.Auth.AuthService.Authenticate",
  "node_type": "method",
  "language": "csharp",
  "file_path": "src/Auth/AuthService.cs",
  "line_start": 42,
  "line_end": 58,
  "access_modifier": "public",
  "is_public_api": true,
  "is_static": false,
  "is_abstract": false,
  "is_async": true,
  "is_override": false,
  "is_deprecated": false,
  "has_doc_comment": true,
  "parse_status": "full",
  "parameter_signature": "(string username, string password)",
  "parameter_count": 2,
  "return_type": "Task<AuthResult>",
  "reference_count": 14
}

Backed by: query::get_node.


get_source_spans

Get all source locations for a symbol (supports partial classes / multi-file symbols).

Parameter Type Required Description
node_id string Yes The symbol's node ID

Returns: Array of source spans ordered by primary-first, then file path + line:

[
  {
    "file_path": "src/Auth/AuthService.cs",
    "line_start": 42,
    "line_end": 58,
    "is_primary": true
  }
]

Backed by: query::get_source.


get_file_outline

List all symbols defined in a file, ordered by line number.

Parameter Type Required Description
path string Yes Repo-relative file path

Returns: Array of symbols in line order:

[
  {
    "node_id": "...",
    "name": "AuthService",
    "qualified_name": "MyApp.Auth.AuthService",
    "node_type": "class",
    "line_start": 10,
    "line_end": 120,
    "access_modifier": "public"
  },
  {
    "node_id": "...",
    "name": "Authenticate",
    "node_type": "method",
    "line_start": 42,
    "line_end": 58,
    "access_modifier": "public",
    "parameter_signature": "(string, string)",
    "return_type": "Task<AuthResult>"
  }
]

Backed by: query::get_outline (requires FileId derivation from path via derive_file_id).


get_callers

Find all symbols that call a given symbol.

Parameter Type Required Description
node_id string Yes The target symbol's node ID

Returns: Array of calling symbols with edge metadata:

[
  {
    "node_id": "...",
    "name": "LoginController.HandleLogin",
    "qualified_name": "MyApp.Controllers.LoginController.HandleLogin",
    "node_type": "method",
    "file_path": "src/Controllers/LoginController.cs",
    "line_start": 25,
    "confidence": "exact"
  }
]

Backed by: query::get_neighbors(node_id, Some(EdgeType::Calls), EdgeDirection::Incoming).


get_callees

Find all symbols that a given symbol calls.

Parameter Type Required Description
node_id string Yes The source symbol's node ID

Returns: Same shape as get_callers.

Backed by: query::get_neighbors(node_id, Some(EdgeType::Calls), EdgeDirection::Outgoing).


get_implementations

Find all symbols that implement a given interface or extend a base class.

Parameter Type Required Description
node_id string Yes The interface or base class node ID

Returns: Array of implementing/extending symbols with edge metadata. Includes both Implements and Extends edge types.

Backed by: query::get_neighbors(node_id, None, EdgeDirection::Incoming) filtered to Implements and Extends edges.


get_references

Find all symbols that reference a given symbol (broader than callers — includes type references, imports, etc.).

Parameter Type Required Description
node_id string Yes The referenced symbol's node ID

Returns: Array of referencing symbols with edge type and confidence.

Backed by: query::get_neighbors(node_id, Some(EdgeType::References), EdgeDirection::Incoming).


get_dependencies

Get all outgoing relationships from a symbol (what it depends on).

Parameter Type Required Description
node_id string Yes The source symbol's node ID
edge_type string No Filter to a specific edge type: calls, inherits, implements, imports, overrides, references, contains, accepts, extends

Returns: Array of dependency symbols with edge type and direction.

Backed by: query::get_neighbors(node_id, edge_type_filter, EdgeDirection::Outgoing).


get_dependents

Get all incoming relationships to a symbol (what depends on it).

Parameter Type Required Description
node_id string Yes The target symbol's node ID
edge_type string No Filter to a specific edge type

Returns: Same shape as get_dependencies.

Backed by: query::get_neighbors(node_id, edge_type_filter, EdgeDirection::Incoming).


4.4 Engine Management

index_files

Trigger indexing for a set of file paths. Creates a ChangeBatch and runs it through the ingest pipeline.

Parameter Type Required Description
paths string[] Yes Repo-relative file paths to index

Returns:

  • indexed (number) — files successfully processed
  • errors (array) — per-file errors, if any

Constraints:

  • Paths must resolve within the repo root.
  • Maximum 100 paths per call.

Backed by: ingest::pipeline::IngestPipeline::process_batch.


get_status

Engine health and indexing status.

Parameter Type Required Description
No parameters

Returns:

{
  "healthy": true,
  "schema_version": 4,
  "indexed_files": 1247,
  "indexed_symbols": 18392,
  "languages": ["csharp", "typescript"],
  "watcher_active": true,
  "last_batch_at": "2026-02-24T10:15:30Z",
  "embedding_model": "all-MiniLM-L6-v2",
  "embeddings_count": 17500
}

Backed by: Metadata queries against _metadata, nodes count, vec_nodes count.


5. Error Handling

All tools return errors in a consistent format:

{
  "error": {
    "code": "not_found",
    "message": "No symbol found with node_id '...'"
  }
}

Standard error codes:

Code Meaning
not_found Requested resource does not exist
invalid_parameter Parameter value is invalid or out of range
path_escape Path resolves outside the repository root
binary_file Attempted to read a binary file
too_large Request exceeds size limits
engine_unavailable Engine not initialised or shutting down
index_error Indexing failed (details in message)

6. Security Constraints

  1. Repo-root sandboxing — all file system tools resolve paths relative to the repo root. Paths containing .. that escape the root are rejected with path_escape.
  2. No writes through MCP — the MCP server exposes read-only file system access. Graph writes only happen through the ingest pipeline (index_files), never through direct graph mutation tools.
  3. No credential exposureget_status does not expose file system paths, config file contents, or any authentication material.
  4. Symlink safety — symlinks that resolve outside the repo root are not followed (consistent with the file watcher's existing symlink/junction guard).

7. Implementation Crate

New workspace member: crates/codeagent-mcp

codeagent-engine/
  crates/
    codeagent-core/    ← existing (unchanged)
    codeagent-cli/     ← existing (unchanged)
    codeagent-mcp/     ← new
      src/
        main.rs        ← MCP server entry point, transport setup
        tools/
          mod.rs       ← tool registration
          filesystem.rs ← list_directory, read_file, get_directory_tree
          search.rs    ← search_symbols, lookup_symbol, find_similar
          navigation.rs ← get_symbol, get_source_spans, get_file_outline,
                          get_callers, get_callees, get_implementations,
                          get_references, get_dependencies, get_dependents
          management.rs ← index_files, get_status
        state.rs       ← shared engine state (writer, reader pool, config, watcher)

Dependencies: codeagent-core, an MCP SDK crate (e.g. rmcp or equivalent), tokio, serde_json.


8. Relationship to Existing Phases

This MCP server replaces Phase 5 (RLM Orchestration). The orchestration layer is no longer part of the index engine — any LLM agent that connects via MCP brings its own orchestration.

Phase numbering becomes:

  • Phase 1 — Foundation (complete)
  • Phase 2 — Semantic Enrichment & Rename Detection (complete)
  • Phase 3 — Summary & Embeddings (complete)
  • Phase 4 — Retrieval & Eval
  • Phase 5 — MCP Server
  • Phase 6 — Hardening & Observability

The old Phase 5 (RLM Orchestration) and related concepts (root LM, sub-LM, visited-node tracking, system prompt authoring, cost hierarchy, rate limiting) are removed from the engine scope. The .NET backend integration for LLM completion is also removed — the engine is now a pure local tool server.