Agent Guide — `java-codebase-rag` MCP

Copy the block between <!-- BEGIN and <!-- END into your project's AGENTS.md, CLAUDE.md, or equivalent. It is self-contained: five MCP tools, shared NodeFilter, edge taxonomy, tool-selection rules, and recovery moves.

java-codebase-rag MCP — operating manual

Tools: search, find, describe, neighbors, resolve.

Node kinds: Symbol (types and methods), Route (HTTP and messaging entry points), Client (outbound HTTP call sites), Producer (outbound async call sites).

Indexed content: Java production sources plus SQL and YAML (use search table: java, sql, yaml, or all).

Ontology: 15 — if results look structurally wrong or empty across tools, the index may be missing, stale, or built with a different ontology_version; you cannot re-index via MCP — ask the operator to rebuild.

Responses: On success, search, find, describe, neighbors, and resolve may include two top-level fields: hints_structured (≤5 suggested next-tool calls) and advisories (≤5 pure informational strings). Each hints_structured entry has tool, args, actionable, label, and reason. actionable=true means you can call the tool directly with args; actionable=false means partial/advisory — fill missing values or use as guidance. reason explains why the hint was emitted. advisories carry context education (fuzzy strategy warnings, role collision explanations, etc.) with no tool call suggestion. For search/find, echoed limit/offset. Hints are advisory; ignore them when success is false.

Use this MCP when you need whole-codebase structure: callers/callees, route handlers, HTTP/async seams, clients/producers, or fuzzy entry points for a concept.

Do not use this MCP when the answer is already in the open file, or for third-party library trivia from training data alone. Prefer the smallest call that answers the question.

What this MCP is not

Test files, build files, CI/deploy — read those files directly in the repo.
Reflection and dynamic dispatch — CALLS is static analysis only; the resolved set is a lower bound.
Proof of absence — an empty result may mean the project was not indexed, the wrong table, or a filter that matches nothing.
Git history — use git log / git blame for "who changed" / "when".

When MCP disagrees with the open file, the file wins; treat the mismatch as a likely stale or incomplete index.

Brownfield annotations on methods

If a method has any of these (including plural containers @CodebaseHttpRoutes, @CodebaseAsyncRoutes, @CodebaseHttpClients, @CodebaseProducers), that annotation is the only source for the facets it declares — framework inference on the same method is not merged for that axis:

Annotation	Declares	Framework rows bypassed (examples)
`@CodebaseHttpRoute`	inbound HTTP path / verb	Spring MVC/WebFlux mapping annotations
`@CodebaseAsyncRoute`	inbound async topic / route	`@KafkaListener`, `@RabbitListener`, …
`@CodebaseHttpClient`	outbound HTTP client call site	`@FeignClient` method mappings, RestTemplate-style inference
`@CodebaseProducer`	outbound async producer call site	`KafkaTemplate` / `StreamBridge` producer inference

Trust the indexed brownfield row over a framework-only reading of the source.

Workflow (locate → inspect → walk)

Locate — resolve for identifier-shaped strings; search for natural language or code fragments; find for structured NodeFilter discovery.
Inspect — describe(id) for the full record and edge_summary (per-label in/out counts).
Walk — neighbors in a loop with explicit direction and edge_types. Multi-hop traces are your reasoning, not a separate tool.

Forced reasoning preamble (every tool call)

Before each MCP call, output one short line:

Q-class: <semantic | structured | inspect | walk>
Pick: <search|find|describe|neighbors|resolve>  Why: <≤8 words>

Then use real JSON shapes (see below). If the call fails or returns nothing useful, use the Recovery playbook — do not thrash.

Edge taxonomy

Use these strings verbatim in neighbors(..., edge_types=[...]).

Stored edges (one hop):

Group	Edge types	Semantics
Type wiring	`EXTENDS`, `IMPLEMENTS`, `INJECTS`	`in` = who depends on this type; `out` = what this type depends on
Containment	`DECLARES`, `DECLARES_CLIENT`, `DECLARES_PRODUCER`	`in` = owner; `out` = owned member, client, or producer
Method overrides	`OVERRIDES`	Subtype method → supertype declaration (same `signature`, one `IMPLEMENTS`/`EXTENDS` hop)
Method calls	`CALLS`	`in` = callers; `out` = callees (method Symbol → method Symbol only)
Service boundary	`EXPOSES`	method Symbol → Route (handler exposes route)
Cross-service	`HTTP_CALLS`, `ASYNC_CALLS`	`HTTP_CALLS`: Client → Route; `ASYNC_CALLS`: Producer → Route

Composed edges — type Symbol origin (direction="out" only):

Edge type	Meaning
`DECLARES.DECLARES_CLIENT`	Members' HTTP clients in one hop
`DECLARES.DECLARES_PRODUCER`	Members' async producers in one hop
`DECLARES.EXPOSES`	Members' exposed routes in one hop

Composed edges — non-static method Symbol origin (direction="out" only):

Edge type	Meaning
`OVERRIDDEN_BY`	Concrete overrider methods (stored `[:OVERRIDES]` dispatch hop)
`OVERRIDDEN_BY.DECLARES_CLIENT`	Clients declared on overriders (`via_id` = overrider method)
`OVERRIDDEN_BY.DECLARES_PRODUCER`	Producers on overriders
`OVERRIDDEN_BY.EXPOSES`	Routes exposed by overriders

Stored vs virtual direction (base override axis): neighbors(decl_id, "out", ["OVERRIDDEN_BY"]) returns the same overrider method ids as neighbors(decl_id, "in", ["OVERRIDES"]) on the same declaration method. Prefer the dot-key when describe.edge_summary advertises OVERRIDDEN_BY.

Do not mix DECLARES.* and OVERRIDDEN_BY.* in one edge_types list on a single origin id — the handler rejects the whole request (only one axis applies per node).

describe edge_summary counts for OVERRIDDEN_BY* use the same stored [:OVERRIDES] dispatch hop as neighbors (ontology 13+ graphs with materialized override edges). Rebuild the index if counts look wrong or dot-key walks return fewer rows than advertised.

Pagination: default neighbors limit=25 slices the merged flat + composed edge list. When edge_summary shows a large out count for a composed key, raise limit (and use offset) or issue separate calls per key.

Argument shapes

JSON, not stringified JSON

Param	Right	Wrong
`edge_types`	`["CALLS"]`	`"CALLS"` or `"[\"CALLS\"]"`
`exclude_roles`	`["DTO","OTHER"]`	stringified array
`filter`	`{"role":"CONTROLLER"}`	nested string JSON
`ids` (batch)	`["sym:…","sym:…"]`	comma-joined string

Omit keys you do not need. Empty string "" is often a real filter that matches nothing.

Node ids

Kind	Prefixes
Symbol	`sym:`
Route	`route:` or `r:`
Client	`client:` or `c:`
Producer	`producer:` or `p:`

Use exact ids from search.symbol_id, find, describe, or neighbors.other.id.

Method / type identity (Symbol FQNs)

<package>.<Type>[.<NestedType>]#<methodName>(<SimpleType1>,<SimpleType2>,…)

Simple types in parentheses; generics erased (List<String> → List). No spaces after commas. No-arg: (). Constructor: #<init>(…).

`neighbors` — required every time

direction: "in" or "out" (no default).
edge_types: non-empty list from the taxonomy above.

Optional filter applies to each other endpoint; populated fields must match that neighbor's kind (strict frame).

Batching: multiple ids expand first; limit/offset slice the merged edge list — raise limit when batching.

Mixed flat + composed edge_types: flat edges are listed before composed edges, then pagination applies. A small limit with e.g. ["DECLARES","DECLARES.DECLARES_CLIENT"] may return only member Symbols and no Clients — use the dot-key alone to list terminals.

Shared `NodeFilter` (`find`, `search.filter`, `neighbors.filter`)

For find, filter is required — {} means no predicates (all nodes of that kind, subject to pagination).

Keys	Applies to
`microservice`, `module`, `source_layer`	All kinds (`source_layer` mainly client / producer)
`role`, `exclude_roles`, `annotation`, `capability`, `fqn_prefix`, `symbol_kind`, `symbol_kinds`	symbol
`http_method`, `path_prefix`, `framework`	route
`client_kind`, `target_service`, `target_path_prefix`, `http_method`	client
`producer_kind`, `topic_prefix`	producer

http_method filters HTTP verbs on routes (declared method) and on clients (outbound call method). Not applicable to symbol rows.

Strict frame: one populated field → one stored attribute for that kind. Unknown keys or inapplicable populated fields → success=false with a teaching message. No wildcards in fqn_prefix, path_prefix, or target_path_prefix (* / ? rejected) — use search(query=…) for ranked text instead. search.query is opaque text, not a DSL.

Identifier resolution (`resolve`)

Input: FQN or suffix, sym:/route:/client:/producer: id, METHOD /path, route path template, client target_service, target_service + path prefix, or producer topic.

hint_kind: optional symbol | route | client | producer. When omitted, generators run across all four kinds (narrow with hint_kind when you know the kind).

`status`	Action
`one`	`describe(id=node.id)`
`many`	pick from `candidates` (`reason`, `score`, `NodeRef`), then `describe`
`none`	fall back to `search(query=…)` for NL/fuzzy discovery

Prefer resolve → describe(id=…) over describe(fqn=…) when an FQN may collide (describe(fqn=…) returns the first row).

microservice — service where the node lives. target_service (clients only) — remote service being called. source_layer (clients/producers) — which extraction layer produced the row (builtin, layer_a_meta, layer_b_ann, layer_c_source, layer_b_fqn, …). role (symbols only) — architectural stereotype (CONTROLLER, SERVICE, …).

Decision tree

User asks…	First step	Typical follow-up
Identifier-shaped string	`resolve` (+ optional `hint_kind`)	`describe` → `neighbors`
Fuzzy / NL "where is X"	`search`	`describe` → `neighbors`
All controllers in service S	`find(kind="symbol", filter={"microservice":"S","role":"CONTROLLER"})`	`neighbors` `CALLS` / `EXPOSES`
Interfaces in service S	`find(..., filter={"microservice":"S","symbol_kind":"interface"})`	`neighbors` / `describe`
HTTP / messaging entry points	`find(kind="route", filter={…})`	`describe`
Outbound HTTP clients	`find(kind="client", filter={…})`	`neighbors(..., "out", ["HTTP_CALLS"])` from client id
Outbound async producers	`find(kind="producer", filter={…})`	`neighbors(..., "out", ["ASYNC_CALLS"])` from producer id
Who calls method M?	id via `resolve` / `find` / `search`	`neighbors(ids, "in", ["CALLS"])`
What does M call?	same	`neighbors(ids, "out", ["CALLS"])`
Who hits this route?	route id	`neighbors(ids, "in", ["HTTP_CALLS","ASYNC_CALLS","EXPOSES"])`
Handler for route	route id	`neighbors(ids, "in", ["EXPOSES"])`
Who implements interface T?	type symbol id	`neighbors(ids, "in", ["IMPLEMENTS"])`
Who injects type T?	type symbol id	`neighbors(ids, "in", ["INJECTS"])`
Impact / "what breaks if I change X"?	no magic tool	loop `neighbors` `in` with `CALLS`, `INJECTS`, … until bounded

Rules of thumb:

Structure beats vector for exact questions — use resolve / find + neighbors, not search, for "who calls …".
Vector beats structure for fuzzy discovery — search first, then pivot to describe / neighbors.

Tool reference

`search`

Ranked chunk retrieval. Args: query, table (java|sql|yaml|all, default java), hybrid (bool), limit (default 5), offset, path_contains, optional filter (symbol-applicable NodeFilter only).

`find`

Exact listing for one kind. Args: kind (symbol|route|client|producer), filter (required object), limit (default 25), offset. Returns NodeRef rows (id, kind, fqn, microservice, module, role on symbols, symbol_kind on symbols).

`describe`

Full node + edge_summary. Args: id (any kind) or fqn (symbol only; id wins).

Stored keys — counts for edges that exist in the graph.
Type symbols (class, interface, enum, record, annotation) may add composed keys DECLARES.DECLARES_CLIENT, DECLARES.DECLARES_PRODUCER, DECLARES.EXPOSES — navigable via neighbors with those dot-keys (out only).
Method symbols may add virtual keys OVERRIDDEN_BY, OVERRIDDEN_BY.DECLARES_*, OVERRIDDEN_BY.EXPOSES (navigable via neighbors on non-static method origins, out only), plus an OVERRIDES row merging stored [:OVERRIDES] incident counts with the rollup dispatch-up count (max per direction). Rollup and dot-key traversal both use stored [:OVERRIDES] for the dispatch hop. Static methods and constructors do not get override-axis keys.

Composed counts are edge rows, not distinct methods; count > 0 means "there is something to walk".

`resolve`

Identifier lookup; three statuses above. Args: identifier, optional hint_kind.

`neighbors`

One hop. Args: ids (string or array), direction, edge_types, limit (default 25), offset, optional filter on the other node, optional edge_filter (edge_types must be exactly ['CALLS'] — no composed dot-keys or second stored label; fail-loud otherwise).

Multiple origin ids: each id loads the full CALLS stream (or generic hop) in list order; offset/limit apply to the concatenated edge list (ids[0] edges first, then ids[1], …), not global source order across origins — a large first origin can leave no rows for later ids within the same page. High fan-out methods are slow; prefer one id per call or a smaller limit. Hints: TPL_NEIGHBORS_CALLS_HIGH_FANOUT / TPL_NEIGHBORS_CALLS_HAS_UNRESOLVED fire only for a single origin id (multi-origin CALLS skips those nudges).

Returns edges with attrs (confidence, strategy, match, … on cross-service edges) and other node.

Cross-service edges (HTTP_CALLS, ASYNC_CALLS): read attrs.confidence and attrs.match — low confidence or unresolved/phantom/ambiguous means treat as a resolver signal, not ground truth.

CALLS edges: source-ordered (call_site_line, call_site_byte). After ontology 15 PR-3, true receiver-failure sites are not on CALLS — they are UnresolvedCallSite nodes (reason: chained_receiver or phantom_unresolved_receiver; ids use the ucs: prefix, other.kind=unresolved_call_site — not describable via describe(id=…)). UNRESOLVED_AT is graph storage only (not in EDGE_SCHEMA / neighbors edge_types). attrs.resolved=false on remaining CALLS rows means known-receiver-external (JDK/Spring) callees, not receiver failure. include_unresolved=True (CALLS + direction=out only) interleaves unresolved sites with resolved CALLS (row_kind discriminator); mutually exclusive with edge_filter. dedup_calls=True collapses identical (origin, callee) CALLS to one row with call_site_lines. filter + edge_filter together load the ordered CALLS stream then apply callee NodeFilter in Python — expect higher latency on hot methods than edge_filter alone. Optional edge_filter projects before pagination: min_confidence; include_strategies / exclude_strategies (mutually exclusive); callee_declaring_role, callee_declaring_roles, exclude_callee_declaring_roles (["OTHER"] also drops known-external rows). filter.role filters the neighbor method (usually OTHER), not the callee stereotype — use edge_filter.callee_declaring_role for repository/service hops. exclude_external applies to find_callers / find_callees only (FQN-prefix); trim JDK noise on neighbors CALLS via edge_filter. Accessor noise: role excludes help; getter/setter heuristics in propose/completed/AGENT-SKILLS-AND-COMMANDS-PROPOSE.md /mini-map.

Ontology glossary

Roles (filter.role / exclude_roles): CONTROLLER, SERVICE, REPOSITORY, COMPONENT, CONFIG, ENTITY, CLIENT, MAPPER, DTO, OTHER.

Capabilities (filter.capability): MESSAGE_LISTENER, MESSAGE_PRODUCER, HTTP_CLIENT, SCHEDULED_TASK, EXCEPTION_HANDLER.

Symbol kinds (symbol_kind / symbol_kinds): class, interface, enum, record, annotation, method, constructor.

Route framework (examples on stored routes): spring_mvc, webflux, kafka, rabbitmq, jms, stream, codebase_async_route, …

Client kinds: feign_method, rest_template, web_client.

Producer kinds: kafka_send, stream_bridge_send.

HTTP call attrs.match / async attrs.match: cross_service, intra_service, ambiguous, phantom, unresolved.

Recovery playbook

Symptom	Likely cause	Fix
`neighbors` validation error	Missing `direction` or `edge_types`	Add both explicitly
Empty `neighbors`	Wrong edge type or direction	Read `describe.edge_summary`; `EXPOSES` is Symbol→Route; `OVERRIDES` is method↔method only; `HTTP_CALLS` starts from Client ids
Cannot find symbol	Wrong id or empty index	`resolve` / `search`; try `find` with `fqn_prefix`
`find` returns too much	Broad filter	Add `microservice`, `fqn_prefix`, `path_prefix`, `topic_prefix`, …
Route not found	Path mismatch	`find(kind="route", filter={"path_prefix":…})`
Empty `search`	Wrong `table`, no index, or chunk miss	Try `table="all"`; `find` with `fqn_prefix`; read source files directly
Empty results across several tools	Index missing, stale, or wrong project	You cannot rebuild the index via MCP — ask the operator; meanwhile use open files / `rg`
Result vs open file disagree	Stale or partial index	Trust the file; say index may be stale
Mixed composed families on one id	`DECLARES.` + `OVERRIDDEN_BY.` together	Split calls — type keys need a type id; override keys need a method id
Override dot-key on type / DECLARES on method	Wrong Symbol origin for axis	Read `describe.edge_summary`; use the axis that matches the node kind

After two failed attempts on the same intent, stop and report tool name, args, and response snippet.

Common navigation patterns

These patterns combine the five tools above. Use the decision tree to pick the right starting tool.

Intent	Tool chain
Natural-language "find X"	`search(query=…, limit=8)` → `describe(top_hit.symbol_id)`
List controllers in service S	`find(kind="symbol", filter={microservice:"S", role:"CONTROLLER"})`
List routes in service S	`find(kind="route", filter={microservice:"S"})`
List clients in service S	`find(kind="client", filter={microservice:"S"}, limit=100)`
List producers in service S	`find(kind="producer", filter={microservice:"S"}, limit=100)`
Who calls method M	`resolve` → `neighbors(ids, "in", ["CALLS"])`
What does M call	`resolve` → `neighbors(ids, "out", ["CALLS"])`
Handler for route R	`neighbors(route_id, "in", ["EXPOSES"])`
All inbound to route R	`neighbors(route_id, "in", ["HTTP_CALLS","ASYNC_CALLS","EXPOSES"])`
Implementors of interface T	`neighbors(type_id, "in", ["IMPLEMENTS"])`
Where is T injected	`neighbors(type_id, "in", ["INJECTS"])`
Impact of changing X	`resolve` → `describe` → bounded `neighbors(in, ["CALLS","INJECTS","IMPLEMENTS","EXTENDS"])` depth ≤2

Canonical workflow: "explain feature X"

search with a short query; pick 1–3 hits with strong symbol_id / role fit.
describe on the chosen id; read edge_summary.
Walk with neighbors using small edge_types sets (e.g. CALLS out, or EXPOSES / cross-service edges for boundaries).
Stop when you can answer; do not prefetch unrelated subgraphs.

Maintenance (repo editors only — do not paste below into agent instructions)

When MCP behaviour, NodeFilter keys, edge labels, or node kinds change:

Update this file's copy block and bump the Ontology: line to match ast_java.ONTOLOGY_VERSION.
Update the five-tool cheat sheet in README.md and the "Driving the MCP from an agent" bullet there.
If enrichment semantics changed, add a "Re-index required" callout in docs/CONFIGURATION.md §3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Guide — `java-codebase-rag` MCP

java-codebase-rag MCP — operating manual

What this MCP is not

Brownfield annotations on methods

Workflow (locate → inspect → walk)

Forced reasoning preamble (every tool call)

Edge taxonomy

Argument shapes

JSON, not stringified JSON

Node ids

Method / type identity (Symbol FQNs)

`neighbors` — required every time

Shared `NodeFilter` (`find`, `search.filter`, `neighbors.filter`)

Identifier resolution (`resolve`)

Decision tree

Tool reference

`search`

`find`

`describe`

`resolve`

`neighbors`

Ontology glossary

Recovery playbook

Common navigation patterns

Canonical workflow: "explain feature X"

Maintenance (repo editors only — do not paste below into agent instructions)

FilesExpand file tree

AGENT-GUIDE.md

Latest commit

History

AGENT-GUIDE.md

File metadata and controls

Agent Guide — java-codebase-rag MCP

java-codebase-rag MCP — operating manual

What this MCP is not

Brownfield annotations on methods

Workflow (locate → inspect → walk)

Forced reasoning preamble (every tool call)

Edge taxonomy

Argument shapes

JSON, not stringified JSON

Node ids

Method / type identity (Symbol FQNs)

neighbors — required every time

Shared NodeFilter (find, search.filter, neighbors.filter)

Identifier resolution (resolve)

Decision tree

Tool reference

search

find

describe

resolve

neighbors

Ontology glossary

Recovery playbook

Common navigation patterns

Canonical workflow: "explain feature X"

Maintenance (repo editors only — do not paste below into agent instructions)

Agent Guide — `java-codebase-rag` MCP

`neighbors` — required every time

Shared `NodeFilter` (`find`, `search.filter`, `neighbors.filter`)

Identifier resolution (`resolve`)

`search`

`find`

`describe`

`resolve`

`neighbors`