Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .cursor/rules/agent-workflow.mdc
Original file line number Diff line number Diff line change
Expand Up @@ -68,9 +68,9 @@ When you're given a per-PR task prompt from `plans/CURSOR-PROMPTS-*.md`:
`java_ontology.py`. Don't sprinkle role / capability / client-kind /
strategy / match string literals across other modules.
- Schema changes that affect the Lance index or Kuzu graph need a
matching update to the README "Re-index required" callout. Bump
`ontology_version` when enrichment semantics change. The current
version is **12**.
matching update to the README "Re-index required" callout. Bump
`ontology_version` when enrichment semantics change. The current
version is **13**.
- Brownfield is a first-class surface: any new auto-detection
(route, role, capability, http client, async producer) must
compose with the matching `BrownfieldOverrides` layer. Last writer
Expand Down
16 changes: 9 additions & 7 deletions .cursor/rules/project-overview.mdc
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,15 @@ when needed.
- `README.md` — feature surface, env vars, ranking, capabilities,
MCP tools (`search` / `find` / `describe` / `neighbors` / `resolve`), `java-codebase-rag` CLI,
"Re-index required" callouts. The current
`ontology_version` is **12** (`@CodebaseHttpClient` rename + shared `CodebaseHttpMethod` enum;
inbound `@CodebaseHttpRoute` replaces same-method built-in HTTP rows; still
`@CodebaseAsyncRoute` wins over same-method
`@KafkaListener`; adds `Client` nodes, `DECLARES_CLIENT`, `find(kind="client")`, plus
HTTP_CALLS / ASYNC_CALLS caller edges and brownfield composition from earlier
bumps). Earlier ontology bumps
are described inline in the README's callouts list.
`ontology_version` is **13** (material `OVERRIDES` Symbol→Symbol edges: subtype
instance method → supertype declaration with matching `signature`, one
`IMPLEMENTS`/`EXTENDS` hop; valid `neighbors` `EdgeType`). Builds on v12
(`@CodebaseHttpClient` rename + shared `CodebaseHttpMethod` enum; inbound
`@CodebaseHttpRoute` replaces same-method built-in HTTP rows; still
`@CodebaseAsyncRoute` wins over same-method `@KafkaListener`; `Client` nodes,
`DECLARES_CLIENT`, `find(kind="client")`, HTTP_CALLS / ASYNC_CALLS caller edges,
brownfield composition from earlier bumps). Earlier ontology bumps are described
inline in the README's callouts list.
- `CODEBASE_REQUIREMENTS.md` — Java-repo assumptions and per-file
map of what to edit when a target tree doesn't match defaults.
- `tests/README.md` — testing philosophy.
Expand Down
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ for tools that don't read `.cursor/rules/`.
- `README.md` — feature surface, env vars, ranking, capabilities,
MCP tool list (`search` / `find` / `describe` / `neighbors` / `resolve`),
CLI ops (`java-codebase-rag --help`), and "Re-index required" callouts.
**`ontology_version` is currently 12** (HTTP brownfield rename + `CodebaseHttpMethod` enum + inbound HTTP layer-C replace; see README graph section).
**`ontology_version` is currently 13** (stored `OVERRIDES` method→method edges traversable via `neighbors`; plus v12 HTTP brownfield rename, `CodebaseHttpMethod` enum, inbound HTTP layer-C replace see README graph section).
- [`docs/JAVA-CODEBASE-RAG-CLI.md`](./docs/JAVA-CODEBASE-RAG-CLI.md) — operator guide for the `java-codebase-rag` CLI (`init` / `increment` / `reprocess` / `erase`, `meta`, `tables`, `diagnose-ignore`, `analyze-pr`; hidden `refresh` alias → `reprocess` — see that doc).
- `CODEBASE_REQUIREMENTS.md` — Java-repo assumptions and tuning map.
- **`propose/`** — design proposes. **In-flight** work is **`propose/*.md`**
Expand Down
2 changes: 1 addition & 1 deletion CODEBASE_REQUIREMENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ root (`role_overrides:`, `route_overrides:`, `http_client_overrides:`,
**MCP discovery:** after indexing, use MCP `find` with `kind="route"` for
inbound HTTP and async routes and `kind="client"` for outbound HTTP `Client`
declarations (Feign methods plus annotated imperative clients). Client rows
require a graph built with `ontology_version` **12** or newer — confirm with
require a graph built with `ontology_version` **13** or newer — confirm with
`java-codebase-rag meta` (JSON field `ontology_version`).

See **Brownfield overrides** in `README.md` for the full schema, usage
Expand Down
13 changes: 8 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ Edit `claude_desktop_config.json` (macOS: `~/Library/Application Support/Claude/

### Driving the MCP from an agent

- **[`docs/AGENT-GUIDE.md`](./docs/AGENT-GUIDE.md)** — copy-paste into `QWEN.md` / `CLAUDE.md` / `AGENTS.md`. Covers the five MCP tools, the shared `NodeFilter`, the edge-type taxonomy, required `neighbors` arguments, the ontology glossary (currently **v12**), the recovery playbook, and slash-style aliases.
- **[`docs/AGENT-GUIDE.md`](./docs/AGENT-GUIDE.md)** — copy-paste into `QWEN.md` / `CLAUDE.md` / `AGENTS.md`. Covers the five MCP tools, the shared `NodeFilter`, the edge-type taxonomy, required `neighbors` arguments, the ontology glossary (currently **v13**), the recovery playbook, and slash-style aliases.
- **[`docs/skills/java-codebase-explore.md`](./docs/skills/java-codebase-explore.md)** — exploration **strategy** (missions, fallbacks, anti-capabilities, stopping rules); AGENT-GUIDE remains the **operating manual** for tool shapes and recovery.
- **[`docs/MANUAL-VERIFICATION-CHECKLIST.md`](./docs/MANUAL-VERIFICATION-CHECKLIST.md)** — 7-phase agent-driven verification you run after indexing your real project. Each item has a copy-paste prompt and calibration data from `tests/bank-chat-system`.
- **[`automation/cursor_propose_only/README.md`](./automation/cursor_propose_only/README.md)** — optional proposal orchestration workflow (single-command autopilot, planning bundles, and automated execution/review loops).
Expand All @@ -242,7 +242,7 @@ Edit `claude_desktop_config.json` (macOS: `~/Library/Application Support/Claude/
|---|---|---|---|
| `search` | Locate nodes by NL/code text. | `query: str`, `table: str="java"`, `hybrid: bool=False`, `limit: int=5`, `offset: int=0`, `path_contains: str \| None`, `filter: NodeFilter \| str \| None` | `{"query":"join operator flow","limit":5}` |
| `find` | Locate nodes by structured filter. | `kind: "symbol"\|"route"\|"client"`, `filter: NodeFilter \| str`, `limit: int=25`, `offset: int=0` | `{"kind":"symbol","filter":{"role":"CONTROLLER"}}` |
| `describe` | Full record + edge counts for one node. For **type** symbols, `edge_summary` may include composed dot-keys (`DECLARES.DECLARES_CLIENT`, `DECLARES.EXPOSES`); for **method** symbols it may include override-axis virtual keys (`OVERRIDDEN_BY`, `OVERRIDDEN_BY.DECLARES_CLIENT`, `OVERRIDDEN_BY.EXPOSES`, `OVERRIDES`). See [`docs/AGENT-GUIDE.md`](./docs/AGENT-GUIDE.md) (`describe`). | `id: str` | `{"id":"sym:com.bank.chat.core.api.ChatController#joinOperator(JoinOperatorRequest)"}` |
| `describe` | Full record + edge counts for one node. For **type** symbols, `edge_summary` may include composed dot-keys (`DECLARES.DECLARES_CLIENT`, `DECLARES.EXPOSES`); for **method** symbols it may include override-axis virtual keys (`OVERRIDDEN_BY`, …) and an `OVERRIDES` row that **merges** stored `[:OVERRIDES]` in/out with the dispatch-up rollup (per direction `max`). See [`docs/AGENT-GUIDE.md`](./docs/AGENT-GUIDE.md) (`describe`). | `id: str` | `{"id":"sym:com.bank.chat.core.api.ChatController#joinOperator(JoinOperatorRequest)"}` |
| `resolve` | Identifier-shaped node lookup (symbol / route / client). Returns `status` `one`, `many`, or `none`; prefer over `describe(fqn=…)` when an FQN may collide. See [`docs/AGENT-GUIDE.md`](./docs/AGENT-GUIDE.md) (`resolve`). | `identifier: str`, `hint_kind: "symbol"|"route"|"client" \| null` | `{"identifier":"com.bank.chat.core.api.ChatController","hint_kind":"symbol"}` |
| `neighbors` | One-hop walk. **Required**: `direction` and `edge_types`. | `ids: str \| list[str]`, `direction: "in"\|"out"`, `edge_types: list[str]`, `limit: int=25`, `offset: int=0`, `filter: NodeFilter \| str \| None` | `{"ids":"route:chat-core:POST:/chat/joinOperator","direction":"in","edge_types":["HTTP_CALLS","ASYNC_CALLS"]}` |

Expand Down Expand Up @@ -359,7 +359,7 @@ For `reprocess`, the pipeline runs `cocoindex` with `cwd` set to the bundle dire

## 6. Graph layer

A deterministic property graph derived from tree-sitter Java parsing lives next to the LanceDB tables under the index directory (default `${JAVA_CODEBASE_RAG_INDEX_DIR:-./.java-codebase-rag}/code_graph.kuzu`). Current ontology version: **12**.
A deterministic property graph derived from tree-sitter Java parsing lives next to the LanceDB tables under the index directory (default `${JAVA_CODEBASE_RAG_INDEX_DIR:-./.java-codebase-rag}/code_graph.kuzu`). Current ontology version: **13**.

### Node kinds

Expand All @@ -371,14 +371,15 @@ A deterministic property graph derived from tree-sitter Java parsing lives next

Unresolved targets become **phantom** nodes (`resolved=false`, FQN guessed from imports / `java.lang`).

### Edge types (9)
### Edge types (10)

| Edge | Direction | Meaning |
|---|---|---|
| `EXTENDS` | type → type | Class- or interface-inheritance. |
| `IMPLEMENTS` | type → interface | Interface implementation. |
| `INJECTS` | type → type | DI: field, constructor, or setter injection (incl. Lombok). |
| `DECLARES` | type → method/constructor | Type declares a callable. |
| `OVERRIDES` | method → method | Subtype instance method overrides a supertype-declared method (same `signature`, one supertype hop via `IMPLEMENTS` / `EXTENDS`). |
| `DECLARES_CLIENT` | type → client | Type declares an outbound call site. |
| `CALLS` | method → method | In-process call (confidence-scored, strategy-tagged). |
| `EXPOSES` | type → route | Type exposes an HTTP/async route. |
Expand Down Expand Up @@ -421,7 +422,9 @@ Resolution order for `microservice`:

### Re-index required when ontology changes

Current ontology version is **12**. Any index built before this version must be rebuilt via `cocoindex update ... --full-reprocess -f` or a full `java-codebase-rag reprocess` (no selective flags) so vectors and graph stay aligned. Until re-indexed, the server defensively JSON-decodes string-form list columns so nothing explodes, but filters like `array_contains` will not work.
Current ontology version is **13**. Any index built before this version must be rebuilt via `cocoindex update ... --full-reprocess -f` or a full `java-codebase-rag reprocess` (no selective flags) so vectors and graph stay aligned. Until re-indexed, the server defensively JSON-decodes string-form list columns so nothing explodes, but filters like `array_contains` will not work.

Ontology **13** materializes stored `OVERRIDES` edges between method Symbols (subtype override → supertype declaration, matching `signature` on a direct `IMPLEMENTS` / `EXTENDS` hop). `neighbors(edge_types=["OVERRIDES"])` traverses this relationship; `OVERRIDDEN_BY*` keys in `edge_summary` remain describe-time rollups only.

Ontology **12** renames `@CodebaseClient` to `@CodebaseHttpClient`, types HTTP `method` as the shared `CodebaseHttpMethod` enum on both inbound and outbound stubs, and makes inbound layer-C HTTP routes **replace** same-method built-in Spring rows (no merge). Rebuild after upgrading so `meta_chain` keys and annotation simple names match the extractor.

Expand Down
2 changes: 1 addition & 1 deletion ast_java.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@
# Phase 9: `@CodebaseAsyncRoute` replaces same-method built-in `@KafkaListener` routes in graph composition.
# Phase 10: `@CodebaseHttpClient` rename + `CodebaseHttpMethod` enum; inbound HTTP layer-C replaces built-in rows.
# Bumps whenever extraction / enrichment semantics change.
ONTOLOGY_VERSION = 12
ONTOLOGY_VERSION = 13

ROLE_ANNOTATIONS: dict[str, str] = {
# Spring Web
Expand Down
54 changes: 53 additions & 1 deletion build_ast_graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
Walks a Java source tree with `tree_sitter_java`, writes a deterministic graph of:
Symbol nodes: package, file, class, interface, enum, record, annotation, method, constructor
Route nodes: declaration-site routes (Spring MVC/WebFlux, Feign, Kafka, …)
Rel tables: EXTENDS, IMPLEMENTS, INJECTS, DECLARES, CALLS, EXPOSES
Rel tables: EXTENDS, IMPLEMENTS, INJECTS, DECLARES, OVERRIDES, CALLS, EXPOSES

Pass 1 builds every node and in-memory resolution indexes.
Pass 2 resolves each extends/implements/injection target using Java's lookup order
Expand Down Expand Up @@ -336,6 +336,7 @@ class GraphTables:
async_call_rows: list[AsyncCallRow] = field(default_factory=list)
client_rows: list[ClientRow] = field(default_factory=list)
declares_client_rows: list[DeclaresClientRow] = field(default_factory=list)
overrides_rows: list[DeclaresRow] = field(default_factory=list)
route_stats: RouteExtractionStats = field(default_factory=RouteExtractionStats)
call_edge_stats: CallEdgeStats = field(default_factory=CallEdgeStats)
client_stats: ClientExtractionStats = field(default_factory=ClientExtractionStats)
Expand Down Expand Up @@ -2186,6 +2187,7 @@ def _micro_factor(member: MemberEntry | None) -> float:
"mechanism STRING, annotation STRING, field_or_param STRING)"
)
_SCHEMA_DECLARES = "CREATE REL TABLE DECLARES(FROM Symbol TO Symbol)"
_SCHEMA_OVERRIDES = "CREATE REL TABLE OVERRIDES(FROM Symbol TO Symbol)"
_SCHEMA_CALLS = (
"CREATE REL TABLE CALLS(FROM Symbol TO Symbol, "
"call_site_line INT64, call_site_byte INT64, arg_count INT64, "
Expand Down Expand Up @@ -2221,6 +2223,7 @@ def _drop_all(conn: kuzu.Connection) -> None:
"DROP TABLE IF EXISTS IMPLEMENTS",
"DROP TABLE IF EXISTS INJECTS",
"DROP TABLE IF EXISTS CALLS",
"DROP TABLE IF EXISTS OVERRIDES",
"DROP TABLE IF EXISTS DECLARES",
"DROP TABLE IF EXISTS Symbol",
"DROP TABLE IF EXISTS Route",
Expand All @@ -2243,6 +2246,7 @@ def _create_schema(conn: kuzu.Connection) -> None:
_SCHEMA_IMPLEMENTS,
_SCHEMA_INJECTS,
_SCHEMA_DECLARES,
_SCHEMA_OVERRIDES,
_SCHEMA_CALLS,
_SCHEMA_EXPOSES,
_SCHEMA_DECLARES_CLIENT,
Expand Down Expand Up @@ -2358,6 +2362,10 @@ def _write_nodes(
"MATCH (a:Symbol {id: $src}), (b:Symbol {id: $dst}) "
"CREATE (a)-[:DECLARES]->(b)"
)
_CREATE_OVERRIDES = (
"MATCH (a:Symbol {id: $src}), (b:Symbol {id: $dst}) "
"CREATE (a)-[:OVERRIDES]->(b)"
)
_CREATE_CALL = (
"MATCH (a:Symbol {id: $src}), (b:Symbol {id: $dst}) "
"CREATE (a)-[:CALLS {"
Expand Down Expand Up @@ -2411,6 +2419,45 @@ def _populate_declares_rows(tables: GraphTables) -> None:
]


def _direct_supertype_ids(tables: GraphTables, type_id: str) -> list[str]:
out: list[str] = []
for r in tables.extends_rows:
if r.src_id == type_id:
out.append(r.dst_id)
for r in tables.implements_rows:
if r.src_id == type_id:
out.append(r.dst_id)
return out


def _populate_overrides_rows(tables: GraphTables) -> None:
"""Materialize (subtype_method)-[:OVERRIDES]->(supertype_method) for one supertype hop.

Matches ``KuzuGraph.override_axis_rollup_for`` (direct ``IMPLEMENTS`` / ``EXTENDS``
only, same ``signature``, distinct method ids, non-static instance methods).
"""
by_declaring_type: dict[str, list[MemberEntry]] = defaultdict(list)
for m in tables.members:
by_declaring_type[m.parent_id].append(m)
pairs: set[tuple[str, str]] = set()
for m in tables.members:
if m.kind != "method" or "static" in m.decl.modifiers:
continue
impl_tid = m.parent_id
for sup_id in _direct_supertype_ids(tables, impl_tid):
for other in by_declaring_type.get(sup_id, ()):
if other.kind != "method":
continue
if other.decl.signature != m.decl.signature:
continue
if other.node_id == m.node_id:
continue
pairs.add((m.node_id, other.node_id))
tables.overrides_rows = [
DeclaresRow(src_id=a, dst_id=b) for a, b in sorted(pairs)
]


def _write_edges(conn: kuzu.Connection, tables: GraphTables) -> None:
for r in tables.extends_rows:
conn.execute(_CREATE_EXT, {
Expand All @@ -2433,6 +2480,9 @@ def _write_edges(conn: kuzu.Connection, tables: GraphTables) -> None:
for row in tables.declares_rows:
conn.execute(_CREATE_DECL, {"src": row.src_id, "dst": row.dst_id})

for row in tables.overrides_rows:
conn.execute(_CREATE_OVERRIDES, {"src": row.src_id, "dst": row.dst_id})

seen_calls: set[tuple[str, str, int, int]] = set()
unique_calls: list[CallsRow] = []
for row in tables.calls_rows:
Expand Down Expand Up @@ -2549,6 +2599,7 @@ def _write_meta(conn: kuzu.Connection, tables: GraphTables, source_root: Path) -
"implements": len(tables.implements_rows),
"injects": len(tables.injects_rows),
"declares": len(tables.declares_rows),
"overrides": len(tables.overrides_rows),
"calls": calls_unique,
"routes": len(tables.routes_rows),
"exposes": len(tables.exposes_rows),
Expand Down Expand Up @@ -2642,6 +2693,7 @@ def write_kuzu(
if verbose:
_verbose_stderr_line(f"[write] nodes written in {time.time() - t0:.2f}s")
_populate_declares_rows(tables)
_populate_overrides_rows(tables)
t1 = time.time()
_write_edges(conn, tables)
if verbose:
Expand Down
Loading
Loading