diff --git a/.cursor/rules/agent-workflow.mdc b/.cursor/rules/agent-workflow.mdc index 8397932..a985149 100644 --- a/.cursor/rules/agent-workflow.mdc +++ b/.cursor/rules/agent-workflow.mdc @@ -68,9 +68,9 @@ When you're given a per-PR task prompt from `plans/CURSOR-PROMPTS-*.md`: `java_ontology.py`. Don't sprinkle role / capability / client-kind / strategy / match string literals across other modules. - Schema changes that affect the Lance index or Kuzu graph need a - matching update to the README "Re-index required" callout. Bump - `ontology_version` when enrichment semantics change. The current - version is **12**. + matching update to the README "Re-index required" callout. Bump + `ontology_version` when enrichment semantics change. The current + version is **13**. - Brownfield is a first-class surface: any new auto-detection (route, role, capability, http client, async producer) must compose with the matching `BrownfieldOverrides` layer. Last writer diff --git a/.cursor/rules/project-overview.mdc b/.cursor/rules/project-overview.mdc index 2b6aacc..e61576f 100644 --- a/.cursor/rules/project-overview.mdc +++ b/.cursor/rules/project-overview.mdc @@ -21,13 +21,15 @@ when needed. - `README.md` — feature surface, env vars, ranking, capabilities, MCP tools (`search` / `find` / `describe` / `neighbors` / `resolve`), `java-codebase-rag` CLI, "Re-index required" callouts. The current - `ontology_version` is **12** (`@CodebaseHttpClient` rename + shared `CodebaseHttpMethod` enum; - inbound `@CodebaseHttpRoute` replaces same-method built-in HTTP rows; still - `@CodebaseAsyncRoute` wins over same-method - `@KafkaListener`; adds `Client` nodes, `DECLARES_CLIENT`, `find(kind="client")`, plus - HTTP_CALLS / ASYNC_CALLS caller edges and brownfield composition from earlier - bumps). Earlier ontology bumps - are described inline in the README's callouts list. + `ontology_version` is **13** (material `OVERRIDES` Symbol→Symbol edges: subtype + instance method → supertype declaration with matching `signature`, one + `IMPLEMENTS`/`EXTENDS` hop; valid `neighbors` `EdgeType`). Builds on v12 + (`@CodebaseHttpClient` rename + shared `CodebaseHttpMethod` enum; inbound + `@CodebaseHttpRoute` replaces same-method built-in HTTP rows; still + `@CodebaseAsyncRoute` wins over same-method `@KafkaListener`; `Client` nodes, + `DECLARES_CLIENT`, `find(kind="client")`, HTTP_CALLS / ASYNC_CALLS caller edges, + brownfield composition from earlier bumps). Earlier ontology bumps are described + inline in the README's callouts list. - `CODEBASE_REQUIREMENTS.md` — Java-repo assumptions and per-file map of what to edit when a target tree doesn't match defaults. - `tests/README.md` — testing philosophy. diff --git a/AGENTS.md b/AGENTS.md index e1e7a8e..0e466ac 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -10,7 +10,7 @@ for tools that don't read `.cursor/rules/`. - `README.md` — feature surface, env vars, ranking, capabilities, MCP tool list (`search` / `find` / `describe` / `neighbors` / `resolve`), CLI ops (`java-codebase-rag --help`), and "Re-index required" callouts. - **`ontology_version` is currently 12** (HTTP brownfield rename + `CodebaseHttpMethod` enum + inbound HTTP layer-C replace; see README graph section). + **`ontology_version` is currently 13** (stored `OVERRIDES` method→method edges traversable via `neighbors`; plus v12 HTTP brownfield rename, `CodebaseHttpMethod` enum, inbound HTTP layer-C replace — see README graph section). - [`docs/JAVA-CODEBASE-RAG-CLI.md`](./docs/JAVA-CODEBASE-RAG-CLI.md) — operator guide for the `java-codebase-rag` CLI (`init` / `increment` / `reprocess` / `erase`, `meta`, `tables`, `diagnose-ignore`, `analyze-pr`; hidden `refresh` alias → `reprocess` — see that doc). - `CODEBASE_REQUIREMENTS.md` — Java-repo assumptions and tuning map. - **`propose/`** — design proposes. **In-flight** work is **`propose/*.md`** diff --git a/CODEBASE_REQUIREMENTS.md b/CODEBASE_REQUIREMENTS.md index a2a3e18..89641eb 100644 --- a/CODEBASE_REQUIREMENTS.md +++ b/CODEBASE_REQUIREMENTS.md @@ -187,7 +187,7 @@ root (`role_overrides:`, `route_overrides:`, `http_client_overrides:`, **MCP discovery:** after indexing, use MCP `find` with `kind="route"` for inbound HTTP and async routes and `kind="client"` for outbound HTTP `Client` declarations (Feign methods plus annotated imperative clients). Client rows -require a graph built with `ontology_version` **12** or newer — confirm with +require a graph built with `ontology_version` **13** or newer — confirm with `java-codebase-rag meta` (JSON field `ontology_version`). See **Brownfield overrides** in `README.md` for the full schema, usage diff --git a/README.md b/README.md index 1b10f1a..4155c0b 100644 --- a/README.md +++ b/README.md @@ -229,7 +229,7 @@ Edit `claude_desktop_config.json` (macOS: `~/Library/Application Support/Claude/ ### Driving the MCP from an agent -- **[`docs/AGENT-GUIDE.md`](./docs/AGENT-GUIDE.md)** — copy-paste into `QWEN.md` / `CLAUDE.md` / `AGENTS.md`. Covers the five MCP tools, the shared `NodeFilter`, the edge-type taxonomy, required `neighbors` arguments, the ontology glossary (currently **v12**), the recovery playbook, and slash-style aliases. +- **[`docs/AGENT-GUIDE.md`](./docs/AGENT-GUIDE.md)** — copy-paste into `QWEN.md` / `CLAUDE.md` / `AGENTS.md`. Covers the five MCP tools, the shared `NodeFilter`, the edge-type taxonomy, required `neighbors` arguments, the ontology glossary (currently **v13**), the recovery playbook, and slash-style aliases. - **[`docs/skills/java-codebase-explore.md`](./docs/skills/java-codebase-explore.md)** — exploration **strategy** (missions, fallbacks, anti-capabilities, stopping rules); AGENT-GUIDE remains the **operating manual** for tool shapes and recovery. - **[`docs/MANUAL-VERIFICATION-CHECKLIST.md`](./docs/MANUAL-VERIFICATION-CHECKLIST.md)** — 7-phase agent-driven verification you run after indexing your real project. Each item has a copy-paste prompt and calibration data from `tests/bank-chat-system`. - **[`automation/cursor_propose_only/README.md`](./automation/cursor_propose_only/README.md)** — optional proposal orchestration workflow (single-command autopilot, planning bundles, and automated execution/review loops). @@ -242,7 +242,7 @@ Edit `claude_desktop_config.json` (macOS: `~/Library/Application Support/Claude/ |---|---|---|---| | `search` | Locate nodes by NL/code text. | `query: str`, `table: str="java"`, `hybrid: bool=False`, `limit: int=5`, `offset: int=0`, `path_contains: str \| None`, `filter: NodeFilter \| str \| None` | `{"query":"join operator flow","limit":5}` | | `find` | Locate nodes by structured filter. | `kind: "symbol"\|"route"\|"client"`, `filter: NodeFilter \| str`, `limit: int=25`, `offset: int=0` | `{"kind":"symbol","filter":{"role":"CONTROLLER"}}` | -| `describe` | Full record + edge counts for one node. For **type** symbols, `edge_summary` may include composed dot-keys (`DECLARES.DECLARES_CLIENT`, `DECLARES.EXPOSES`); for **method** symbols it may include override-axis virtual keys (`OVERRIDDEN_BY`, `OVERRIDDEN_BY.DECLARES_CLIENT`, `OVERRIDDEN_BY.EXPOSES`, `OVERRIDES`). See [`docs/AGENT-GUIDE.md`](./docs/AGENT-GUIDE.md) (`describe`). | `id: str` | `{"id":"sym:com.bank.chat.core.api.ChatController#joinOperator(JoinOperatorRequest)"}` | +| `describe` | Full record + edge counts for one node. For **type** symbols, `edge_summary` may include composed dot-keys (`DECLARES.DECLARES_CLIENT`, `DECLARES.EXPOSES`); for **method** symbols it may include override-axis virtual keys (`OVERRIDDEN_BY`, …) and an `OVERRIDES` row that **merges** stored `[:OVERRIDES]` in/out with the dispatch-up rollup (per direction `max`). See [`docs/AGENT-GUIDE.md`](./docs/AGENT-GUIDE.md) (`describe`). | `id: str` | `{"id":"sym:com.bank.chat.core.api.ChatController#joinOperator(JoinOperatorRequest)"}` | | `resolve` | Identifier-shaped node lookup (symbol / route / client). Returns `status` `one`, `many`, or `none`; prefer over `describe(fqn=…)` when an FQN may collide. See [`docs/AGENT-GUIDE.md`](./docs/AGENT-GUIDE.md) (`resolve`). | `identifier: str`, `hint_kind: "symbol"|"route"|"client" \| null` | `{"identifier":"com.bank.chat.core.api.ChatController","hint_kind":"symbol"}` | | `neighbors` | One-hop walk. **Required**: `direction` and `edge_types`. | `ids: str \| list[str]`, `direction: "in"\|"out"`, `edge_types: list[str]`, `limit: int=25`, `offset: int=0`, `filter: NodeFilter \| str \| None` | `{"ids":"route:chat-core:POST:/chat/joinOperator","direction":"in","edge_types":["HTTP_CALLS","ASYNC_CALLS"]}` | @@ -359,7 +359,7 @@ For `reprocess`, the pipeline runs `cocoindex` with `cwd` set to the bundle dire ## 6. Graph layer -A deterministic property graph derived from tree-sitter Java parsing lives next to the LanceDB tables under the index directory (default `${JAVA_CODEBASE_RAG_INDEX_DIR:-./.java-codebase-rag}/code_graph.kuzu`). Current ontology version: **12**. +A deterministic property graph derived from tree-sitter Java parsing lives next to the LanceDB tables under the index directory (default `${JAVA_CODEBASE_RAG_INDEX_DIR:-./.java-codebase-rag}/code_graph.kuzu`). Current ontology version: **13**. ### Node kinds @@ -371,7 +371,7 @@ A deterministic property graph derived from tree-sitter Java parsing lives next Unresolved targets become **phantom** nodes (`resolved=false`, FQN guessed from imports / `java.lang`). -### Edge types (9) +### Edge types (10) | Edge | Direction | Meaning | |---|---|---| @@ -379,6 +379,7 @@ Unresolved targets become **phantom** nodes (`resolved=false`, FQN guessed from | `IMPLEMENTS` | type → interface | Interface implementation. | | `INJECTS` | type → type | DI: field, constructor, or setter injection (incl. Lombok). | | `DECLARES` | type → method/constructor | Type declares a callable. | +| `OVERRIDES` | method → method | Subtype instance method overrides a supertype-declared method (same `signature`, one supertype hop via `IMPLEMENTS` / `EXTENDS`). | | `DECLARES_CLIENT` | type → client | Type declares an outbound call site. | | `CALLS` | method → method | In-process call (confidence-scored, strategy-tagged). | | `EXPOSES` | type → route | Type exposes an HTTP/async route. | @@ -421,7 +422,9 @@ Resolution order for `microservice`: ### Re-index required when ontology changes -Current ontology version is **12**. Any index built before this version must be rebuilt via `cocoindex update ... --full-reprocess -f` or a full `java-codebase-rag reprocess` (no selective flags) so vectors and graph stay aligned. Until re-indexed, the server defensively JSON-decodes string-form list columns so nothing explodes, but filters like `array_contains` will not work. +Current ontology version is **13**. Any index built before this version must be rebuilt via `cocoindex update ... --full-reprocess -f` or a full `java-codebase-rag reprocess` (no selective flags) so vectors and graph stay aligned. Until re-indexed, the server defensively JSON-decodes string-form list columns so nothing explodes, but filters like `array_contains` will not work. + +Ontology **13** materializes stored `OVERRIDES` edges between method Symbols (subtype override → supertype declaration, matching `signature` on a direct `IMPLEMENTS` / `EXTENDS` hop). `neighbors(edge_types=["OVERRIDES"])` traverses this relationship; `OVERRIDDEN_BY*` keys in `edge_summary` remain describe-time rollups only. Ontology **12** renames `@CodebaseClient` to `@CodebaseHttpClient`, types HTTP `method` as the shared `CodebaseHttpMethod` enum on both inbound and outbound stubs, and makes inbound layer-C HTTP routes **replace** same-method built-in Spring rows (no merge). Rebuild after upgrading so `meta_chain` keys and annotation simple names match the extractor. diff --git a/ast_java.py b/ast_java.py index b9ace24..225120e 100644 --- a/ast_java.py +++ b/ast_java.py @@ -81,7 +81,7 @@ # Phase 9: `@CodebaseAsyncRoute` replaces same-method built-in `@KafkaListener` routes in graph composition. # Phase 10: `@CodebaseHttpClient` rename + `CodebaseHttpMethod` enum; inbound HTTP layer-C replaces built-in rows. # Bumps whenever extraction / enrichment semantics change. -ONTOLOGY_VERSION = 12 +ONTOLOGY_VERSION = 13 ROLE_ANNOTATIONS: dict[str, str] = { # Spring Web diff --git a/build_ast_graph.py b/build_ast_graph.py index 4437344..1080887 100644 --- a/build_ast_graph.py +++ b/build_ast_graph.py @@ -4,7 +4,7 @@ Walks a Java source tree with `tree_sitter_java`, writes a deterministic graph of: Symbol nodes: package, file, class, interface, enum, record, annotation, method, constructor Route nodes: declaration-site routes (Spring MVC/WebFlux, Feign, Kafka, …) - Rel tables: EXTENDS, IMPLEMENTS, INJECTS, DECLARES, CALLS, EXPOSES + Rel tables: EXTENDS, IMPLEMENTS, INJECTS, DECLARES, OVERRIDES, CALLS, EXPOSES Pass 1 builds every node and in-memory resolution indexes. Pass 2 resolves each extends/implements/injection target using Java's lookup order @@ -336,6 +336,7 @@ class GraphTables: async_call_rows: list[AsyncCallRow] = field(default_factory=list) client_rows: list[ClientRow] = field(default_factory=list) declares_client_rows: list[DeclaresClientRow] = field(default_factory=list) + overrides_rows: list[DeclaresRow] = field(default_factory=list) route_stats: RouteExtractionStats = field(default_factory=RouteExtractionStats) call_edge_stats: CallEdgeStats = field(default_factory=CallEdgeStats) client_stats: ClientExtractionStats = field(default_factory=ClientExtractionStats) @@ -2186,6 +2187,7 @@ def _micro_factor(member: MemberEntry | None) -> float: "mechanism STRING, annotation STRING, field_or_param STRING)" ) _SCHEMA_DECLARES = "CREATE REL TABLE DECLARES(FROM Symbol TO Symbol)" +_SCHEMA_OVERRIDES = "CREATE REL TABLE OVERRIDES(FROM Symbol TO Symbol)" _SCHEMA_CALLS = ( "CREATE REL TABLE CALLS(FROM Symbol TO Symbol, " "call_site_line INT64, call_site_byte INT64, arg_count INT64, " @@ -2221,6 +2223,7 @@ def _drop_all(conn: kuzu.Connection) -> None: "DROP TABLE IF EXISTS IMPLEMENTS", "DROP TABLE IF EXISTS INJECTS", "DROP TABLE IF EXISTS CALLS", + "DROP TABLE IF EXISTS OVERRIDES", "DROP TABLE IF EXISTS DECLARES", "DROP TABLE IF EXISTS Symbol", "DROP TABLE IF EXISTS Route", @@ -2243,6 +2246,7 @@ def _create_schema(conn: kuzu.Connection) -> None: _SCHEMA_IMPLEMENTS, _SCHEMA_INJECTS, _SCHEMA_DECLARES, + _SCHEMA_OVERRIDES, _SCHEMA_CALLS, _SCHEMA_EXPOSES, _SCHEMA_DECLARES_CLIENT, @@ -2358,6 +2362,10 @@ def _write_nodes( "MATCH (a:Symbol {id: $src}), (b:Symbol {id: $dst}) " "CREATE (a)-[:DECLARES]->(b)" ) +_CREATE_OVERRIDES = ( + "MATCH (a:Symbol {id: $src}), (b:Symbol {id: $dst}) " + "CREATE (a)-[:OVERRIDES]->(b)" +) _CREATE_CALL = ( "MATCH (a:Symbol {id: $src}), (b:Symbol {id: $dst}) " "CREATE (a)-[:CALLS {" @@ -2411,6 +2419,45 @@ def _populate_declares_rows(tables: GraphTables) -> None: ] +def _direct_supertype_ids(tables: GraphTables, type_id: str) -> list[str]: + out: list[str] = [] + for r in tables.extends_rows: + if r.src_id == type_id: + out.append(r.dst_id) + for r in tables.implements_rows: + if r.src_id == type_id: + out.append(r.dst_id) + return out + + +def _populate_overrides_rows(tables: GraphTables) -> None: + """Materialize (subtype_method)-[:OVERRIDES]->(supertype_method) for one supertype hop. + + Matches ``KuzuGraph.override_axis_rollup_for`` (direct ``IMPLEMENTS`` / ``EXTENDS`` + only, same ``signature``, distinct method ids, non-static instance methods). + """ + by_declaring_type: dict[str, list[MemberEntry]] = defaultdict(list) + for m in tables.members: + by_declaring_type[m.parent_id].append(m) + pairs: set[tuple[str, str]] = set() + for m in tables.members: + if m.kind != "method" or "static" in m.decl.modifiers: + continue + impl_tid = m.parent_id + for sup_id in _direct_supertype_ids(tables, impl_tid): + for other in by_declaring_type.get(sup_id, ()): + if other.kind != "method": + continue + if other.decl.signature != m.decl.signature: + continue + if other.node_id == m.node_id: + continue + pairs.add((m.node_id, other.node_id)) + tables.overrides_rows = [ + DeclaresRow(src_id=a, dst_id=b) for a, b in sorted(pairs) + ] + + def _write_edges(conn: kuzu.Connection, tables: GraphTables) -> None: for r in tables.extends_rows: conn.execute(_CREATE_EXT, { @@ -2433,6 +2480,9 @@ def _write_edges(conn: kuzu.Connection, tables: GraphTables) -> None: for row in tables.declares_rows: conn.execute(_CREATE_DECL, {"src": row.src_id, "dst": row.dst_id}) + for row in tables.overrides_rows: + conn.execute(_CREATE_OVERRIDES, {"src": row.src_id, "dst": row.dst_id}) + seen_calls: set[tuple[str, str, int, int]] = set() unique_calls: list[CallsRow] = [] for row in tables.calls_rows: @@ -2549,6 +2599,7 @@ def _write_meta(conn: kuzu.Connection, tables: GraphTables, source_root: Path) - "implements": len(tables.implements_rows), "injects": len(tables.injects_rows), "declares": len(tables.declares_rows), + "overrides": len(tables.overrides_rows), "calls": calls_unique, "routes": len(tables.routes_rows), "exposes": len(tables.exposes_rows), @@ -2642,6 +2693,7 @@ def write_kuzu( if verbose: _verbose_stderr_line(f"[write] nodes written in {time.time() - t0:.2f}s") _populate_declares_rows(tables) + _populate_overrides_rows(tables) t1 = time.time() _write_edges(conn, tables) if verbose: diff --git a/docs/AGENT-GUIDE.md b/docs/AGENT-GUIDE.md index 5b7c5ba..ff8b5d1 100644 --- a/docs/AGENT-GUIDE.md +++ b/docs/AGENT-GUIDE.md @@ -12,10 +12,12 @@ > `neighbors` arguments, pass stringified JSON, or use vector search for > questions the graph answers exactly. This guide keeps them on the rails. > -> Calibrated against ontology version **12** (see `ast_java.ONTOLOGY_VERSION` / -> `java_ontology.py` valid sets): HTTP brownfield rename (`@CodebaseHttpClient`), -> shared `CodebaseHttpMethod` enum, inbound layer-C HTTP routes replace same-method -> built-in rows. **Design rationale:** navigation surface and tools — +> Calibrated against ontology version **13** (see `ast_java.ONTOLOGY_VERSION` / +> `java_ontology.py` valid sets): stored `OVERRIDES` Symbol→Symbol edges (subtype +> override → supertype declaration, matching `signature`, one `IMPLEMENTS`/`EXTENDS` +> hop) and `neighbors(edge_types=["OVERRIDES"])`. Still includes v12 HTTP brownfield +> (`@CodebaseHttpClient`, shared `CodebaseHttpMethod` enum, inbound layer-C HTTP routes +> replace same-method built-in rows). **Design rationale:** navigation surface and tools — > [`propose/completed/MCP-API-V2-REDESIGN-PROPOSE.md`](../propose/completed/MCP-API-V2-REDESIGN-PROPOSE.md); > HTTP brownfield rename, `CodebaseHttpMethod`, and exclusivity — > [`propose/HTTP-ROUTE-METHOD-ENUM-PROPOSE.md`](../propose/HTTP-ROUTE-METHOD-ENUM-PROPOSE.md). @@ -29,7 +31,7 @@ This MCP indexes Java enterprise projects into two stores: - **LanceDB** — vector + optional hybrid (FTS + vector) search over Java / SQL / YAML chunks. -- **Kuzu graph** — exact structure: **node kinds** `Symbol`, `Route`, `Client` and **nine edge types** (see *Edge taxonomy* below). +- **Kuzu graph** — exact structure: **node kinds** `Symbol`, `Route`, `Client` and **ten edge types** (see *Edge taxonomy* below). **MCP surface (navigation only):** `search`, `find`, `describe`, `neighbors`, `resolve`. @@ -80,7 +82,7 @@ Pick: Why: <≤8 words> Then check *Argument shapes* (real JSON arrays/objects, required `neighbors` fields). If the call returns nothing useful, do not thrash — use the **Recovery playbook**. -### Edge taxonomy (nine labels) +### Edge taxonomy (ten labels) Use these strings **verbatim** in `neighbors(..., edge_types=[...])`: @@ -88,6 +90,7 @@ Use these strings **verbatim** in `neighbors(..., edge_types=[...])`: | ----- | ---------- | --------- | | Type wiring | `EXTENDS`, `IMPLEMENTS`, `INJECTS` | `in` = who depends on this type; `out` = what this type depends on | | Containment | `DECLARES`, `DECLARES_CLIENT` | `in` = owner; `out` = owned member / client | +| Method overrides | `OVERRIDES` | Subtype **method** → supertype **declaration** method (same `signature`, one `IMPLEMENTS`/`EXTENDS` hop). `in` = overriders; `out` = overridden declarations | | Method calls | `CALLS` | `in` = callers; `out` = callees | | Service boundary | `EXPOSES` | Symbol → Route (handler exposes route) | | Cross-service | `HTTP_CALLS`, `ASYNC_CALLS` | Symbol → Route across services | @@ -218,7 +221,7 @@ Exact allowed values for roles, capabilities, client kinds, etc. live in `java_o #### `describe` -- **Purpose:** Full node payload + `edge_summary`: `in` / `out` counts **per stored graph edge label** (what exists as edges in Kuzu). For **type** Symbols only (`class`, `interface`, `enum`, `record`, `annotation`), the same map may also include **describe-time composed** dot-keys — summaries of member edges, not stored labels — see the next bullets (`DECLARES.DECLARES_CLIENT`, `DECLARES.EXPOSES`); those keys are **not** valid in `neighbors(edge_types=…)`. For **method** Symbols, the map may include **override-axis** virtual keys (`OVERRIDDEN_BY`, `OVERRIDDEN_BY.DECLARES_CLIENT`, `OVERRIDDEN_BY.EXPOSES`, `OVERRIDES`); see **Override-axis keys (method Symbols)** below — also not `EdgeType` literals. +- **Purpose:** Full node payload + `edge_summary`: `in` / `out` counts **per stored graph edge label** (what exists as edges in Kuzu). For **type** Symbols only (`class`, `interface`, `enum`, `record`, `annotation`), the same map may also include **describe-time composed** dot-keys — summaries of member edges, not stored labels — see the next bullets (`DECLARES.DECLARES_CLIENT`, `DECLARES.EXPOSES`); those keys are **not** valid in `neighbors(edge_types=…)`. For **method** Symbols, the map may include **override-axis virtual keys** (`OVERRIDDEN_BY`, `OVERRIDDEN_BY.DECLARES_CLIENT`, `OVERRIDDEN_BY.EXPOSES`) plus an **`OVERRIDES` row** that merges stored `[:OVERRIDES]` incident counts with the describe-time dispatch-up rollup (per direction `max`, so inbound stored overrides are preserved); see **Override-axis keys (method Symbols)** below — those virtual keys are **not** `neighbors` arguments. The **stored** relationship label **`OVERRIDES`** **is** a valid `EdgeType` for `neighbors` (same spelling as the map key; the map row is the merged view). - **Args:** `id` (symbol, route, or client id) or **`fqn`** (exact symbol FQN when you do not have the graph id). When both are set, `id` wins. For identifier-shaped inputs and FQN collision handling, see **Identifier resolution** above. **Composed `edge_summary` keys (type Symbols).** Keys use dot notation: `.`. Two are emitted today: @@ -230,17 +233,17 @@ Composed keys are **read-only**: they cannot be passed to `neighbors(edge_types= Note on counting semantics: composed counts measure **edge rows**, not distinct member methods. One method that declares multiple `Client` rows (e.g. a `rest_template` method with several call sites) contributes its full edge count to `DECLARES.DECLARES_CLIENT`. The "does this class have any clients?" predicate is answered by `count > 0`; the count itself is an affordance for how rich the downstream walk will be. -**Override-axis keys (method Symbols).** These name dispatch-axis virtual relations (computed at describe-time from `IMPLEMENTS` / `EXTENDS` plus matching `Symbol.signature`; not stored edges): +**Override-axis keys (method Symbols).** Dispatch-axis signals computed at describe-time from `IMPLEMENTS` / `EXTENDS` plus matching `Symbol.signature` (not stored as their own rel types): - `OVERRIDDEN_BY` — on declarations reachable from implementing / extending classes in one hop: count of **distinct** concrete override methods with the same `signature` string as the described method (not counting the declaration itself). - `OVERRIDDEN_BY.DECLARES_CLIENT` / `OVERRIDDEN_BY.EXPOSES` — same dispatch-down walk, then count outgoing `DECLARES_CLIENT` / `EXPOSES` edges from those override methods. Counts are **edge rows** on overrides (not distinct methods): one override with multiple client edges contributes the full row count. Omitted when zero. -- `OVERRIDES` — on a concrete method: count of **distinct** upstream declarations (interface / superclass methods with the same `signature`) one `IMPLEMENTS`/`EXTENDS` hop from the declaring class. A class implementing two interfaces that both declare the same signature yields `out: 2` (two declaration symbols). +- `OVERRIDES` (map row) — merges **stored** `[:OVERRIDES]` `in`/`out` (subtype→supertype edges in Kuzu) with the dispatch-up rollup (distinct upstream declarations one `IMPLEMENTS`/`EXTENDS` hop away, same `signature`). The rollup alone always reported `in: 0`; merging fixes `in` when this method is also a super declaration with incoming override edges. A class implementing two interfaces that both declare the same signature yields `out: 2` on the rollup arm (and matches stored outbound edges when materialization aligns). Prefer `neighbors(ids=, direction="out", edge_types=["OVERRIDES"])` to list declaration ids, and `direction="in"` for overriders. -Walk recipe (declaration side): `neighbors(ids=, direction="in", edge_types=["DECLARES"])` → declaring type → `neighbors(ids=, direction="in", edge_types=["IMPLEMENTS","EXTENDS"])` → each subtype class → `neighbors(ids=, direction="out", edge_types=["DECLARES"])` and filter rows where `signature` matches the interface method. +Walk recipe (manual, if you need types in the middle): `neighbors(ids=, direction="in", edge_types=["DECLARES"])` → declaring type → `neighbors(ids=, direction="in", edge_types=["IMPLEMENTS","EXTENDS"])` → each subtype class → `neighbors(ids=, direction="out", edge_types=["DECLARES"])` and filter rows where `signature` matches the interface method. Static methods suppress the entire override-axis rollup. Constructors do not receive these keys. -These keys are **not** valid `EdgeType` literals — `neighbors(edge_types=["OVERRIDDEN_BY"])` fails at the Pydantic boundary. Use them as hop affordances only. +Virtual keys (`OVERRIDDEN_BY`, …) and composed dot-keys are **not** valid `EdgeType` literals — `neighbors(edge_types=["OVERRIDDEN_BY"])` fails at the Pydantic boundary. Use them as hop affordances only. **`OVERRIDES`** in `edge_types=[...]` selects the **stored** override relationship (same spelling as the `OVERRIDES` map key, whose `in`/`out` merge Kuzu edges with the dispatch-up rollup). #### `resolve` @@ -253,9 +256,9 @@ These keys are **not** valid `EdgeType` literals — `neighbors(edge_types=["OVE - **Purpose:** One hop over explicit edge types; returns **edges** with attributes (`confidence`, `strategy`, `match`, …) and the **`other`** node. - **Args:** `ids` (string or array — batch allowed), **`direction`** (`in`|`out`), **`edge_types`** (non-empty list), `limit`, `offset`, optional `filter` on the other node. - **Batching:** Multiple origins are expanded; pagination slices the **combined** edge list — use larger `limit` when batching many ids. -- **Confidence:** Cross-service edges (`HTTP_CALLS`, `ASYNC_CALLS`) carry confidence, strategy, and match metadata on `edge.attrs` (`attrs.confidence`, `attrs.strategy`, `attrs.match`). Low confidence means the resolver had to guess at the route binding — treat it as a **resolver gap signal**, not a hallucination. Report low-confidence edges with their confidence value, not as facts. Intra-service edges (`CALLS`, `INJECTS`, `IMPLEMENTS`, `EXTENDS`, `DECLARES`, `DECLARES_CLIENT`, `EXPOSES`) faithfully represent the static graph; the resolved set is still a **lower bound** under reflection / dynamic dispatch (see *What this MCP is NOT*). +- **Confidence:** Cross-service edges (`HTTP_CALLS`, `ASYNC_CALLS`) carry confidence, strategy, and match metadata on `edge.attrs` (`attrs.confidence`, `attrs.strategy`, `attrs.match`). Low confidence means the resolver had to guess at the route binding — treat it as a **resolver gap signal**, not a hallucination. Report low-confidence edges with their confidence value, not as facts. Intra-service edges (`CALLS`, `INJECTS`, `IMPLEMENTS`, `EXTENDS`, `DECLARES`, `DECLARES_CLIENT`, `EXPOSES`, `OVERRIDES`) faithfully represent the static graph; the resolved set is still a **lower bound** under reflection / dynamic dispatch (see *What this MCP is NOT*). -### Ontology glossary (version 12) +### Ontology glossary (version 13) Source of truth: `java_ontology.py`. Strings are case-sensitive. @@ -274,7 +277,7 @@ Source of truth: `java_ontology.py`. Strings are case-sensitive. | Symptom | Likely cause | Fix | | ------- | ------------ | --- | | `neighbors` validation error | Missing `direction` or `edge_types` | Add both explicitly | -| Empty `neighbors` | Wrong edge type for the node kind, or wrong direction | Check `describe.edge_summary`; `EXPOSES` is Symbol↔Route — direction matters | +| Empty `neighbors` | Wrong edge type for the node kind, or wrong direction | Check `describe.edge_summary`; `EXPOSES` is Symbol↔Route — direction matters. `OVERRIDES` is method↔method only | | Cannot find symbol | Wrong id or stale index | `resolve` / `search` with distinctive string; verify `java-codebase-rag meta` (CLI) | | `find` returns too much | Over-broad filter | Add `microservice`, `fqn_prefix`, `path_prefix`, etc. | | Route not found | Path mismatch | Use `path_prefix` on `find(kind="route", …)`; check README brownfield routes | diff --git a/kuzu_queries.py b/kuzu_queries.py index 0aa41c3..b8673ae 100644 --- a/kuzu_queries.py +++ b/kuzu_queries.py @@ -198,6 +198,7 @@ def _scope_filters( "EXTENDS", "IMPLEMENTS", "INJECTS", + "OVERRIDES", "DECLARES", "CALLS", "EXPOSES", diff --git a/mcp_v2.py b/mcp_v2.py index 0618cf7..505f1ad 100644 --- a/mcp_v2.py +++ b/mcp_v2.py @@ -37,10 +37,12 @@ # Composed describe-time keys in edge_summary (e.g. DECLARES.DECLARES_CLIENT) are # intentionally not EdgeType literals — neighbors(edge_types=...) rejects them. +# Virtual override-axis keys (OVERRIDDEN_BY, …) are also rejected; stored OVERRIDES is an EdgeType. EdgeType = Literal[ "EXTENDS", "IMPLEMENTS", "INJECTS", + "OVERRIDES", "DECLARES", "DECLARES_CLIENT", "CALLS", @@ -264,8 +266,11 @@ class NodeRecord(BaseModel): "(DECLARES to member, then that edge) — edge-row counts, not EdgeType literals; " "do not pass them to neighbors(edge_types=…). For method Symbols, may include " "override-axis virtual keys `OVERRIDDEN_BY`, `OVERRIDDEN_BY.DECLARES_CLIENT`, " - "`OVERRIDDEN_BY.EXPOSES`, and `OVERRIDES` (same dot convention; also not valid " - "EdgeType literals for neighbors)." + "`OVERRIDDEN_BY.EXPOSES`, plus an `OVERRIDES` map entry that **merges** stored " + "`[:OVERRIDES]` in/out counts with the describe-time dispatch-up rollup (per " + "direction `max`, so inbound stored overrides are not dropped). Those virtual / " + "dot-keys are not valid neighbors(edge_types=…) arguments. The stored relationship " + "label `OVERRIDES` **is** a valid EdgeType for neighbors." ), ) @@ -525,6 +530,33 @@ def _load_node_record(graph: KuzuGraph, node_id: str, kind: Literal["symbol", "r return rows[0] +def _incident_counts(cell: dict[str, int] | None) -> dict[str, int]: + if not cell: + return {"in": 0, "out": 0} + return {"in": int(cell.get("in", 0)), "out": int(cell.get("out", 0))} + + +def _merge_overrides_edge_summary( + stored_before_rollups: dict[str, int], + summary_after_rollups: dict[str, dict[str, int]], +) -> None: + """Reconcile `OVERRIDES` with `override_axis_rollup_for` without clobbering stored `in`. + + Rollup rows reuse the ``OVERRIDES`` key for dispatch-up counts only (``in`` is always + zero there). Stored ``[:OVERRIDES]`` edges contribute real ``in``/``out`` from Kuzu; + merge per direction with ``max`` so inbound override edges stay visible. + """ + roll = _incident_counts(summary_after_rollups.get("OVERRIDES")) + if "OVERRIDES" not in summary_after_rollups and not any(stored_before_rollups.values()): + return + merged_in = max(stored_before_rollups["in"], roll["in"]) + merged_out = max(stored_before_rollups["out"], roll["out"]) + if merged_in == 0 and merged_out == 0: + summary_after_rollups.pop("OVERRIDES", None) + else: + summary_after_rollups["OVERRIDES"] = {"in": merged_in, "out": merged_out} + + def _edge_summary_for_node( graph: KuzuGraph, node_id: str, *, kind: str, row: dict[str, Any] ) -> dict[str, dict[str, int]]: @@ -533,7 +565,9 @@ def _edge_summary_for_node( if kind == "symbol" and sym_kind in _TYPE_SYMBOL_KINDS_FOR_EDGE_ROLLUP: summary.update(graph.member_edge_rollup_for(node_id)) elif kind == "symbol" and sym_kind in _METHOD_SYMBOL_KINDS_FOR_OVERRIDE_ROLLUP: + stored_overrides = _incident_counts(summary.get("OVERRIDES")) summary.update(graph.override_axis_rollup_for(node_id)) + _merge_overrides_edge_summary(stored_overrides, summary) return summary diff --git a/server.py b/server.py index 9a9ae17..a2c3413 100644 --- a/server.py +++ b/server.py @@ -30,7 +30,7 @@ "resolve (identifier-shaped lookup for symbol/route/client — three statuses one|many|none). " "NodeFilter `filter` is a JSON object (preferred); a JSON-encoded string is also accepted as a fallback. " "Unknown filter keys and populated fields not applicable to the effective node kind fail with success=false and message. " - "Edge labels: EXTENDS, IMPLEMENTS, INJECTS, DECLARES, DECLARES_CLIENT, CALLS, EXPOSES, HTTP_CALLS, ASYNC_CALLS. " + "Edge labels: EXTENDS, IMPLEMENTS, INJECTS, OVERRIDES, DECLARES, DECLARES_CLIENT, CALLS, EXPOSES, HTTP_CALLS, ASYNC_CALLS. " "Reprocess/init, meta, tables, diagnose-ignore, analyze-pr: use java-codebase-rag CLI — not MCP." ) @@ -411,10 +411,12 @@ async def find( @mcp.tool( name="describe", description=( - "Full node record plus `edge_summary` (in/out counts per stored edge label). Type Symbols may add " - "describe-time composed keys such as DECLARES.DECLARES_CLIENT and DECLARES.EXPOSES; method Symbols may " - "add override-axis virtual keys (OVERRIDDEN_BY, OVERRIDDEN_BY.DECLARES_CLIENT, OVERRIDDEN_BY.EXPOSES, " - "OVERRIDES). Those dot-keys are read-only summaries—not valid `neighbors(edge_types=…)` values. " + "Full node record plus `edge_summary` (in/out counts per stored edge label, plus optional describe-time keys). Type Symbols may add " + "composed keys DECLARES.DECLARES_CLIENT and DECLARES.EXPOSES; method Symbols may add " + "override-axis virtual keys (OVERRIDDEN_BY, OVERRIDDEN_BY.DECLARES_CLIENT, OVERRIDDEN_BY.EXPOSES, " + "plus an `OVERRIDES` map entry that merges stored `[:OVERRIDES]` counts with the dispatch-up rollup per direction). Those dot-keys and virtual keys are " + "read-only summaries—not valid `neighbors(edge_types=…)` values. The stored `OVERRIDES` relationship " + "is a normal edge label and may be traversed via neighbors(edge_types=[..., \"OVERRIDES\", ...]). " "Pass `id` for any kind, or exact `fqn` for Symbol lookup (`id` wins when both are set). " "`describe(fqn=…)` keeps the first graph row when multiple symbols share that FQN; when an FQN may collide, " "prefer `resolve(identifier=…, hint_kind='symbol')` first, then `describe(id=…)` on the chosen node." @@ -451,7 +453,7 @@ async def neighbors( description="Required. 'in' = predecessors (callers), 'out' = successors (callees). No default.", ), edge_types: list[mcp_v2.EdgeType] = Field( - description="Required non-empty list of edge labels (e.g. CALLS, EXPOSES, HTTP_CALLS)", + description="Required non-empty list of edge labels (e.g. CALLS, EXPOSES, HTTP_CALLS, OVERRIDES)", ), limit: int = Field( default=25, diff --git a/tests/fixtures/override_axis_rollup_smoke/src/main/java/orolla/abstractroute/BottomApi.java b/tests/fixtures/override_axis_rollup_smoke/src/main/java/orolla/abstractroute/BottomApi.java new file mode 100644 index 0000000..d8ab95f --- /dev/null +++ b/tests/fixtures/override_axis_rollup_smoke/src/main/java/orolla/abstractroute/BottomApi.java @@ -0,0 +1,6 @@ +package orolla.abstractroute; + +public class BottomApi extends MiddleApi { + @Override + public void handle() {} +} diff --git a/tests/fixtures/override_axis_rollup_smoke/src/main/java/orolla/abstractroute/MiddleApi.java b/tests/fixtures/override_axis_rollup_smoke/src/main/java/orolla/abstractroute/MiddleApi.java new file mode 100644 index 0000000..241ca11 --- /dev/null +++ b/tests/fixtures/override_axis_rollup_smoke/src/main/java/orolla/abstractroute/MiddleApi.java @@ -0,0 +1,6 @@ +package orolla.abstractroute; + +public abstract class MiddleApi extends AbstractApi { + @Override + public void handle() {} +} diff --git a/tests/test_ast_graph_build.py b/tests/test_ast_graph_build.py index 0efb954..3fff42b 100644 --- a/tests/test_ast_graph_build.py +++ b/tests/test_ast_graph_build.py @@ -51,7 +51,7 @@ def test_schema_has_all_expected_tables(kuzu_db_path: Path) -> None: # free to add more (e.g. CALLS later) without breaking this test. expected = { "Symbol", "Route", "Client", "GraphMeta", - "EXTENDS", "IMPLEMENTS", "INJECTS", "DECLARES", "CALLS", "EXPOSES", "DECLARES_CLIENT", + "EXTENDS", "IMPLEMENTS", "INJECTS", "DECLARES", "OVERRIDES", "CALLS", "EXPOSES", "DECLARES_CLIENT", } missing = expected - tables assert not missing, f"missing schema tables: {missing}; saw {tables}" diff --git a/tests/test_mcp_tools.py b/tests/test_mcp_tools.py index 5498020..e49b18a 100644 --- a/tests/test_mcp_tools.py +++ b/tests/test_mcp_tools.py @@ -72,6 +72,7 @@ async def test_tool_input_schema_includes_expected_enums(mcp_server) -> None: "EXTENDS", "IMPLEMENTS", "INJECTS", + "OVERRIDES", "DECLARES", "DECLARES_CLIENT", "CALLS", diff --git a/tests/test_mcp_v2_compose.py b/tests/test_mcp_v2_compose.py index 3f5379e..98a3d0c 100644 --- a/tests/test_mcp_v2_compose.py +++ b/tests/test_mcp_v2_compose.py @@ -9,6 +9,7 @@ from _builders import build_kuzu_to from kuzu_queries import KuzuGraph from mcp_v2 import ( + _NEIGHBOR_EDGE_TYPES_ADAPTER, _TYPE_SYMBOL_KINDS_FOR_EDGE_ROLLUP, describe_v2, neighbors_v2, @@ -27,6 +28,7 @@ "HTTP_CALLS", "IMPLEMENTS", "INJECTS", + "OVERRIDES", ) _ROLLUP_TYPE_KINDS = sorted(_TYPE_SYMBOL_KINDS_FOR_EDGE_ROLLUP) @@ -421,6 +423,25 @@ def test_describe_abstract_method_with_route_override_emits_exposes(override_axi assert es.get("OVERRIDDEN_BY.EXPOSES") == {"in": 0, "out": want_ex} +def test_describe_method_edge_summary_overrides_merges_stored_in_with_dispatch_up_out( + override_axis_graph: KuzuGraph, +) -> None: + """Middle override: incoming [:OVERRIDES] from subclass + rollup dispatch-up must not zero `in`.""" + rows = override_axis_graph._rows( # noqa: SLF001 + "MATCH (t:Symbol {fqn: $fqn})-[:DECLARES]->(m:Symbol) " + "WHERE m.kind = 'method' AND m.name = 'handle' " + "RETURN m.id AS id LIMIT 1", + {"fqn": "orolla.abstractroute.MiddleApi"}, + ) + assert rows + mid = str(rows[0]["id"]) + out = describe_v2(mid, graph=override_axis_graph) + assert out.success is True + assert out.record is not None + assert out.record.edge_summary is not None + assert out.record.edge_summary.get("OVERRIDES") == {"in": 1, "out": 1} + + def test_describe_interface_method_diamond_override_counts_once_per_upstream( override_axis_graph: KuzuGraph, ) -> None: @@ -439,3 +460,80 @@ def test_describe_interface_method_diamond_override_counts_once_per_upstream( assert out.record is not None assert out.record.edge_summary is not None assert out.record.edge_summary.get("OVERRIDES") == {"in": 0, "out": 2} + + +def test_overrides_stored_neighbors_in_matches_override_axis_impl_ids(override_axis_graph: KuzuGraph) -> None: + rows = override_axis_graph._rows( # noqa: SLF001 + "MATCH (t:Symbol {fqn: $fqn})-[:DECLARES]->(m:Symbol) " + "WHERE m.kind = 'method' AND m.name = 'handle' " + "RETURN m.id AS id LIMIT 1", + {"fqn": "orolla.abstractroute.AbstractApi"}, + ) + assert rows + mid = str(rows[0]["id"]) + want = sorted(_dispatch_down_override_method_ids(override_axis_graph, mid)) + out = neighbors_v2(mid, direction="in", edge_types=["OVERRIDES"], graph=override_axis_graph) + assert out.success is True + got = sorted({e.other.id for e in out.results}) + assert got == want + + +def test_overrides_stored_neighbors_out_matches_override_axis_decl_ids(override_axis_graph: KuzuGraph) -> None: + rows = override_axis_graph._rows( # noqa: SLF001 + "MATCH (t:Symbol {fqn: $fqn})-[:DECLARES]->(m:Symbol) " + "WHERE m.kind = 'method' AND m.name = 'shared' " + "RETURN m.id AS id LIMIT 1", + {"fqn": "orolla.diamond.DiamondC"}, + ) + assert rows + mid = str(rows[0]["id"]) + want = sorted(_dispatch_up_declaration_method_ids(override_axis_graph, mid)) + out = neighbors_v2(mid, direction="out", edge_types=["OVERRIDES"], graph=override_axis_graph) + assert out.success is True + got = sorted({e.other.id for e in out.results}) + assert got == want + + +def test_overrides_rel_schema_round_trips(override_axis_graph: KuzuGraph) -> None: + import kuzu + + conn = kuzu.Connection(kuzu.Database(override_axis_graph.db_path, read_only=True)) + tables = set() + r = conn.execute("CALL show_tables() RETURN *;") + while r.has_next(): + row = r.get_next() + tables.add(str(row[1])) + assert "OVERRIDES" in tables + n = 0 + r2 = conn.execute("MATCH ()-[e:OVERRIDES]->() RETURN count(e) AS n") + if r2.has_next(): + n = int(r2.get_next()[0] or 0) + assert n > 0 + + +def test_neighbors_edge_type_adapter_accepts_overrides() -> None: + _NEIGHBOR_EDGE_TYPES_ADAPTER.validate_python(["OVERRIDES"]) + + +def test_neighbors_rejects_overridden_by_and_dot_keys(kuzu_graph: KuzuGraph) -> None: + node_id, _ = _controller_method_with_calls(kuzu_graph) + with pytest.raises(ValidationError): + neighbors_v2(node_id, direction="out", edge_types=["OVERRIDDEN_BY"], graph=kuzu_graph) + with pytest.raises(ValidationError): + neighbors_v2(node_id, direction="out", edge_types=["DECLARES.DECLARES_CLIENT"], graph=kuzu_graph) + + +def test_overrides_edge_set_deterministic_double_build(tmp_path: Path) -> None: + def edge_pairs(db_path: Path) -> list[tuple[str, str]]: + g = KuzuGraph(str(db_path)) + rows = g._rows( # noqa: SLF001 + "MATCH (a:Symbol)-[e:OVERRIDES]->(b:Symbol) " + "RETURN a.id AS src, b.id AS dst ORDER BY src, dst", + ) + return [(str(r["src"]), str(r["dst"])) for r in rows] + + p1 = tmp_path / "g1.kuzu" + p2 = tmp_path / "g2.kuzu" + build_kuzu_to(_OVERRIDE_AXIS_FIXTURE, p1, max_pass=5) + build_kuzu_to(_OVERRIDE_AXIS_FIXTURE, p2, max_pass=5) + assert edge_pairs(p1) == edge_pairs(p2)