Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .cursor/rules/project-overview.mdc
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ when needed.
MCP tools (`search` / `find` / `describe` / `neighbors` / `resolve`; response
`hints` + pagination echo on locate tools — see README), `java-codebase-rag` CLI,
"Re-index required" callouts. The current
`ontology_version` is **13** (material `OVERRIDES` Symbol→Symbol edges: subtype
`ontology_version` is **14** (`EDGE_SCHEMA` in `java_ontology.py`; material `OVERRIDES` Symbol→Symbol edges: subtype
instance method → supertype declaration with matching `signature`, one
`IMPLEMENTS`/`EXTENDS` hop; valid `neighbors` `EdgeType`). Builds on v12
(`@CodebaseHttpClient` rename + shared `CodebaseHttpMethod` enum; inbound
Expand Down
3 changes: 3 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,9 @@ jobs:
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -e .
- name: Check generated edge navigation doc
if: steps.changes.outputs.code == 'true'
run: python scripts/generate_edge_navigation.py --check
- name: Run tests
if: steps.changes.outputs.code == 'true'
env:
Expand Down
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ for tools that don't read `.cursor/rules/`.
MCP tool list (`search` / `find` / `describe` / `neighbors` / `resolve`;
response `hints` + pagination echo — see README),
CLI ops (`java-codebase-rag --help`), and "Re-index required" callouts.
**`ontology_version` is currently 13** (stored `OVERRIDES` method→method edges traversable via `neighbors`; plus v12 HTTP brownfield rename, `CodebaseHttpMethod` enum, inbound HTTP layer-C replace — see README graph section).
**`ontology_version` is currently 14** (`EDGE_SCHEMA` in `java_ontology.py`; v14 re-index required; HTTP/ASYNC caller-side endpoint flips ship in SCHEMA-V2 PR-B/C — see README graph section and `docs/EDGE-NAVIGATION.md`).
- [`docs/JAVA-CODEBASE-RAG-CLI.md`](./docs/JAVA-CODEBASE-RAG-CLI.md) — operator guide for the `java-codebase-rag` CLI (`init` / `increment` / `reprocess` / `erase`, `meta`, `tables`, `diagnose-ignore`, `analyze-pr`; hidden `refresh` alias → `reprocess` — see that doc).
- `CODEBASE_REQUIREMENTS.md` — Java-repo assumptions and tuning map.
- **`propose/`** — design proposes. **In-flight** work is **`propose/*.md`**
Expand Down
2 changes: 1 addition & 1 deletion CODEBASE_REQUIREMENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ root (`role_overrides:`, `route_overrides:`, `http_client_overrides:`,
**MCP discovery:** after indexing, use MCP `find` with `kind="route"` for
inbound HTTP and async routes and `kind="client"` for outbound HTTP `Client`
declarations (Feign methods plus annotated imperative clients). Client rows
require a graph built with `ontology_version` **13** or newer — confirm with
require a graph built with `ontology_version` **14** or newer — confirm with
`java-codebase-rag meta` (JSON field `ontology_version`).

See **Brownfield overrides** in `README.md` for the full schema, usage
Expand Down
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ Edit `claude_desktop_config.json` (macOS: `~/Library/Application Support/Claude/

### Driving the MCP from an agent

- **[`docs/AGENT-GUIDE.md`](./docs/AGENT-GUIDE.md)** — copy-paste into `QWEN.md` / `CLAUDE.md` / `AGENTS.md`. Covers the five MCP tools, the shared `NodeFilter`, the edge-type taxonomy, required `neighbors` arguments, the ontology glossary (currently **v13**), the recovery playbook, and slash-style aliases.
- **[`docs/AGENT-GUIDE.md`](./docs/AGENT-GUIDE.md)** — copy-paste into `QWEN.md` / `CLAUDE.md` / `AGENTS.md`. Covers the five MCP tools, the shared `NodeFilter`, the edge-type taxonomy, required `neighbors` arguments, the ontology glossary (currently **v14**), the recovery playbook, and slash-style aliases.
- **[`docs/skills/java-codebase-explore.md`](./docs/skills/java-codebase-explore.md)** — exploration **strategy** (missions, fallbacks, anti-capabilities, stopping rules); AGENT-GUIDE remains the **operating manual** for tool shapes and recovery.
- **[`docs/MANUAL-VERIFICATION-CHECKLIST.md`](./docs/MANUAL-VERIFICATION-CHECKLIST.md)** — 7-phase agent-driven verification you run after indexing your real project. Each item has a copy-paste prompt and calibration data from `tests/bank-chat-system`.
- **[`automation/cursor_propose_only/README.md`](./automation/cursor_propose_only/README.md)** — optional proposal orchestration workflow (single-command autopilot, planning bundles, and automated execution/review loops).
Expand Down Expand Up @@ -361,7 +361,7 @@ For `reprocess`, the pipeline runs `cocoindex` with `cwd` set to the bundle dire

## 6. Graph layer

A deterministic property graph derived from tree-sitter Java parsing lives next to the LanceDB tables under the index directory (default `${JAVA_CODEBASE_RAG_INDEX_DIR:-./.java-codebase-rag}/code_graph.kuzu`). Current ontology version: **13**.
A deterministic property graph derived from tree-sitter Java parsing lives next to the LanceDB tables under the index directory (default `${JAVA_CODEBASE_RAG_INDEX_DIR:-./.java-codebase-rag}/code_graph.kuzu`). Current ontology version: **14** (see [`docs/EDGE-NAVIGATION.md`](./docs/EDGE-NAVIGATION.md) for edge shapes).

### Node kinds

Expand Down Expand Up @@ -424,7 +424,9 @@ Resolution order for `microservice`:

### Re-index required when ontology changes

Current ontology version is **13**. Any index built before this version must be rebuilt via `cocoindex update ... --full-reprocess -f` or a full `java-codebase-rag reprocess` (no selective flags) so vectors and graph stay aligned. Until re-indexed, the server defensively JSON-decodes string-form list columns so nothing explodes, but filters like `array_contains` will not work.
Current ontology version is **14**. Any index built before this version must be rebuilt via `cocoindex update ... --full-reprocess -f` or a full `java-codebase-rag reprocess` (no selective flags) so vectors and graph stay aligned. Until re-indexed, the server defensively JSON-decodes string-form list columns so nothing explodes, but filters like `array_contains` will not work.

Ontology **14** introduces `EDGE_SCHEMA` in `java_ontology.py` as the canonical edge navigation schema (see `docs/EDGE-NAVIGATION.md`). **This PR-A bump alone does not flip `HTTP_CALLS` / `ASYNC_CALLS` endpoints** — graphs rebuilt at v14 still use `Symbol → Route` for those edges until SCHEMA-V2 PR-B/C land. **PR-B** flips `HTTP_CALLS` to `Client → Route`; **PR-C** adds the `Producer` node, `DECLARES_PRODUCER`, and flips `ASYNC_CALLS` to `Producer → Route`. Run one full reprocess after upgrading through the SCHEMA-V2 sequence (or when you need the v14 ontology gate).

Ontology **13** materializes stored `OVERRIDES` edges between method Symbols (subtype override → supertype declaration, matching `signature` on a direct `IMPLEMENTS` / `EXTENDS` hop). `neighbors(edge_types=["OVERRIDES"])` traverses this relationship; `OVERRIDDEN_BY*` keys in `edge_summary` remain describe-time rollups only.

Expand Down
3 changes: 2 additions & 1 deletion ast_java.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,9 @@
# Phase 8: first-class Client node + DECLARES_CLIENT relation, separating outbound declarations from Route.
# Phase 9: `@CodebaseAsyncRoute` replaces same-method built-in `@KafkaListener` routes in graph composition.
# Phase 10: `@CodebaseHttpClient` rename + `CodebaseHttpMethod` enum; inbound HTTP layer-C replaces built-in rows.
# Phase 11: `EDGE_SCHEMA` in `java_ontology.py` (canonical edge navigation schema; v14 re-index).
# Bumps whenever extraction / enrichment semantics change.
ONTOLOGY_VERSION = 13
ONTOLOGY_VERSION = 14

ROLE_ANNOTATIONS: dict[str, str] = {
# Spring Web
Expand Down
13 changes: 7 additions & 6 deletions docs/AGENT-GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,11 @@
> `neighbors` arguments, pass stringified JSON, or use vector search for
> questions the graph answers exactly. This guide keeps them on the rails.
>
> Calibrated against ontology version **13** (see `ast_java.ONTOLOGY_VERSION` /
> `java_ontology.py` valid sets): stored `OVERRIDES` Symbol→Symbol edges (subtype
> override → supertype declaration, matching `signature`, one `IMPLEMENTS`/`EXTENDS`
> hop) and `neighbors(edge_types=["OVERRIDES"])`. Still includes v12 HTTP brownfield
> Calibrated against ontology version **14** (see `ast_java.ONTOLOGY_VERSION` /
> `java_ontology.EDGE_SCHEMA` + valid sets): canonical edge navigation schema in
> `docs/EDGE-NAVIGATION.md`. v14 re-index required; PR-B flips `HTTP_CALLS` to
> `Client → Route`; PR-C adds `Producer` + `DECLARES_PRODUCER` and flips `ASYNC_CALLS`.
> Still includes stored `OVERRIDES` Symbol→Symbol edges and v12 HTTP brownfield
> (`@CodebaseHttpClient`, shared `CodebaseHttpMethod` enum, inbound layer-C HTTP routes
> replace same-method built-in rows). **Design rationale:** navigation surface and tools —
> [`propose/completed/MCP-API-V2-REDESIGN-PROPOSE.md`](../propose/completed/MCP-API-V2-REDESIGN-PROPOSE.md);
Expand Down Expand Up @@ -260,9 +261,9 @@ Virtual keys (`OVERRIDDEN_BY`, …) and composed dot-keys are **not** valid `Edg
- **Batching:** Multiple origins are expanded; pagination slices the **combined** edge list — use larger `limit` when batching many ids.
- **Confidence:** Cross-service edges (`HTTP_CALLS`, `ASYNC_CALLS`) carry confidence, strategy, and match metadata on `edge.attrs` (`attrs.confidence`, `attrs.strategy`, `attrs.match`). Low confidence means the resolver had to guess at the route binding — treat it as a **resolver gap signal**, not a hallucination. Report low-confidence edges with their confidence value, not as facts. Intra-service edges (`CALLS`, `INJECTS`, `IMPLEMENTS`, `EXTENDS`, `DECLARES`, `DECLARES_CLIENT`, `EXPOSES`, `OVERRIDES`) faithfully represent the static graph; the resolved set is still a **lower bound** under reflection / dynamic dispatch (see *What this MCP is NOT*).

### Ontology glossary (version 13)
### Ontology glossary (version 14)

Source of truth: `java_ontology.py`. Strings are case-sensitive.
Source of truth: `java_ontology.py` (`EDGE_SCHEMA`, valid sets). Strings are case-sensitive. Edge navigation: [`docs/EDGE-NAVIGATION.md`](./EDGE-NAVIGATION.md) — use `*_current` traversal keys for `HTTP_CALLS` / `ASYNC_CALLS` until SCHEMA-V2 PR-B/C flip endpoints.

**Roles:** `CONTROLLER`, `SERVICE`, `REPOSITORY`, `COMPONENT`, `CONFIG`, `ENTITY`, `CLIENT`, `MAPPER`, `DTO`, `OTHER`.

Expand Down
234 changes: 234 additions & 0 deletions docs/EDGE-NAVIGATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,234 @@
# Edge Navigation Schema

> **Generated from `java_ontology.EDGE_SCHEMA` — do not edit by hand.**
> Regenerate: `.venv/bin/python scripts/generate_edge_navigation.py`

## Summary

| Edge | From | To | Cardinality | Brownfield-resolver-sourced | Member-only |
| --- | --- | --- | --- | --- | --- |
| EXTENDS | Symbol | Symbol | many_to_one | no | no |
| IMPLEMENTS | Symbol | Symbol | many_to_many | no | no |
| INJECTS | Symbol | Symbol | many_to_many | no | no |
| DECLARES | Symbol | Symbol | one_to_many | no | no |
| OVERRIDES | Symbol | Symbol | many_to_one | no | yes |
| CALLS | Symbol | Symbol | many_to_many | yes | yes |
| EXPOSES | Symbol | Route | one_to_one | yes | yes |
| DECLARES_CLIENT | Symbol | Client | one_to_many | yes | yes |
| HTTP_CALLS | Symbol | Route | many_to_many | yes | no |
| ASYNC_CALLS | Symbol | Route | many_to_many | yes | no |

## EXTENDS

**Endpoints**: `Symbol → Symbol`
**Cardinality**: `many_to_one`
**Brownfield-resolver-sourced**: no
**Member-only** (hints): no

**Purpose**: class or interface direct supertype relation

**Attributes**:

- `dst_name` (`STRING`) — raw supertype name as written in source
- `dst_fqn` (`STRING`) — best-effort resolved FQN of the supertype
- `resolved` (`BOOLEAN`) — True iff dst_fqn was resolved to an in-graph Symbol

**Typical traversals**:

- `type_subject`: neighbors(['{id}'],'out',['DECLARES']) then neighbors(member_ids,'{direction}',['EXTENDS'])
- `member_subject`: neighbors(['{id}'],'out',['EXTENDS'])
- `alien_subject`: EXTENDS connects Symbol → Symbol; use a type or member Symbol id

## IMPLEMENTS

**Endpoints**: `Symbol → Symbol`
**Cardinality**: `many_to_many`
**Brownfield-resolver-sourced**: no
**Member-only** (hints): no

**Purpose**: class implements interface relation

**Attributes**:

- `dst_name` (`STRING`) — raw interface name as written in source
- `dst_fqn` (`STRING`) — best-effort resolved FQN of the interface
- `resolved` (`BOOLEAN`) — True iff dst_fqn was resolved to an in-graph Symbol

**Typical traversals**:

- `type_subject`: neighbors(['{id}'],'out',['DECLARES']) then neighbors(member_ids,'{direction}',['IMPLEMENTS'])
- `member_subject`: neighbors(['{id}'],'out',['IMPLEMENTS'])
- `alien_subject`: IMPLEMENTS connects Symbol → Symbol; use a type or member Symbol id

## INJECTS

**Endpoints**: `Symbol → Symbol`
**Cardinality**: `many_to_many`
**Brownfield-resolver-sourced**: no
**Member-only** (hints): no

**Purpose**: dependency injection edge from declaring type to injected type

**Attributes**:

- `dst_name` (`STRING`) — raw injected type name as written in source
- `dst_fqn` (`STRING`) — best-effort resolved FQN of the injected type
- `resolved` (`BOOLEAN`) — True iff dst_fqn was resolved to an in-graph Symbol
- `mechanism` (`STRING`) — injection mechanism literal (constructor, field, setter, …)
- `annotation` (`STRING`) — injection annotation simple name when present
- `field_or_param` (`STRING`) — field or parameter name for the injection site

**Typical traversals**:

- `type_subject`: neighbors(['{id}'],'out',['DECLARES']) then neighbors(member_ids,'{direction}',['INJECTS'])
- `member_subject`: neighbors(['{id}'],'in',['INJECTS'])
- `alien_subject`: INJECTS connects Symbol → Symbol; use a type Symbol id

## DECLARES

**Endpoints**: `Symbol → Symbol`
**Cardinality**: `one_to_many`
**Brownfield-resolver-sourced**: no
**Member-only** (hints): no

**Purpose**: type declares member Symbol (method, constructor, nested type)

**Attributes**: _(none)_

**Typical traversals**:

- `type_subject`: neighbors(['{id}'],'out',['DECLARES'])
- `member_subject`: neighbors(['{id}'],'in',['DECLARES'])
- `alien_subject`: DECLARES connects Symbol → Symbol; use a type Symbol id for outbound members

## OVERRIDES

**Endpoints**: `Symbol → Symbol`
**Cardinality**: `many_to_one`
**Brownfield-resolver-sourced**: no
**Member-only** (hints): yes

**Purpose**: subtype method overrides supertype declared method with matching signature

**Attributes**: _(none)_

**Typical traversals**:

- `type_subject`: neighbors(['{id}'],'out',['DECLARES']) then neighbors(member_ids,'{direction}',['OVERRIDES'])
- `member_subject`: neighbors(['{id}'],'out',['OVERRIDES'])
- `alien_subject`: OVERRIDES connects method Symbol → method Symbol

## CALLS

**Endpoints**: `Symbol → Symbol`
**Cardinality**: `many_to_many`
**Brownfield-resolver-sourced**: yes
**Member-only** (hints): yes

**Purpose**: intra-codebase method call from caller method to callee method

**Attributes**:

- `call_site_line` (`INT64`) — source line of the call site
- `call_site_byte` (`INT64`) — source byte offset of the call site
- `arg_count` (`INT64`) — argument count at the call site (-1 for method references)
- `confidence` (`DOUBLE`) — resolver confidence in [0.0, 1.0]
- `strategy` (`STRING`) — call-graph resolution strategy literal
- `source` (`STRING`) — call-graph source tag
- `resolved` (`BOOLEAN`) — True iff callee Symbol was resolved in-graph

**Typical traversals**:

- `type_subject`: neighbors(['{id}'],'out',['DECLARES']) then neighbors(member_ids,'{direction}',['CALLS'])
- `member_subject`: neighbors(['{id}'],'out',['CALLS'])
- `alien_subject`: CALLS connects method Symbol → method Symbol

## EXPOSES

**Endpoints**: `Symbol → Route`
**Cardinality**: `one_to_one`
**Brownfield-resolver-sourced**: yes
**Member-only** (hints): yes

**Purpose**: declaring method exposes an inbound HTTP or messaging Route

**Attributes**:

- `confidence` (`DOUBLE`) — route extraction confidence in [0.0, 1.0]
- `strategy` (`STRING`) — route resolution strategy literal

**Typical traversals**:

- `type_subject`: neighbors(['{id}'],'out',['DECLARES']) then neighbors(member_ids,'{direction}',['EXPOSES'])
- `member_subject`: neighbors(['{id}'],'out',['EXPOSES'])
- `alien_subject`: EXPOSES connects method Symbol → Route; use a method Symbol id

## DECLARES_CLIENT

**Endpoints**: `Symbol → Client`
**Cardinality**: `one_to_many`
**Brownfield-resolver-sourced**: yes
**Member-only** (hints): yes

**Purpose**: method declares an outbound HTTP client call site

**Attributes**:

- `confidence` (`DOUBLE`) — client declaration confidence in [0.0, 1.0]
- `strategy` (`STRING`) — client resolution strategy literal

**Typical traversals**:

- `type_subject`: neighbors(['{id}'],'out',['DECLARES']) then neighbors(member_ids,'{direction}',['DECLARES_CLIENT'])
- `member_subject`: neighbors(['{id}'],'out',['DECLARES_CLIENT'])
- `alien_subject`: DECLARES_CLIENT connects method Symbol → Client

## HTTP_CALLS

**Endpoints**: `Symbol → Route`
**Cardinality**: `many_to_many`
**Brownfield-resolver-sourced**: yes
**Member-only** (hints): no

**Purpose**: resolved HTTP call from declaring method to target route (pre-flip: Symbol→Route; PR-B: Client→Route)

**Attributes**:

- `confidence` (`DOUBLE`) — pass6 match confidence in [0.0, 1.0]
- `strategy` (`STRING`) — HTTP call resolution strategy literal
- `method_call` (`STRING`) — HTTP method of the call site
- `raw_uri` (`STRING`) — uninterpolated URI template from the call site
- `match` (`STRING`) — cross_service|intra_service|ambiguous|phantom|unresolved

**Typical traversals**:

- `type_subject_current`: neighbors(['{id}'],'out',['DECLARES']) then neighbors(member_ids,'out',['HTTP_CALLS'])
- `type_subject`: neighbors(['{id}'],'out',['DECLARES']) then neighbors(member_ids,'out',['DECLARES_CLIENT']) then neighbors(client_ids,'out',['HTTP_CALLS'])
- `member_subject_current`: neighbors(['{id}'],'out',['HTTP_CALLS'])
- `member_subject`: neighbors(['{id}'],'out',['DECLARES_CLIENT']) then neighbors(client_ids,'out',['HTTP_CALLS'])
- `alien_subject`: HTTP_CALLS is Symbol→Route until PR-B; use member_subject_current. After PR-B (Client→Route), use member_subject via DECLARES_CLIENT

## ASYNC_CALLS

**Endpoints**: `Symbol → Route`
**Cardinality**: `many_to_many`
**Brownfield-resolver-sourced**: yes
**Member-only** (hints): no

**Purpose**: resolved async call from declaring method to topic route (pre-flip: Symbol→Route; PR-C: Producer→Route)

**Attributes**:

- `confidence` (`DOUBLE`) — pass6 match confidence in [0.0, 1.0]
- `strategy` (`STRING`) — async call resolution strategy literal
- `direction` (`STRING`) — produce|consume async direction literal
- `raw_topic` (`STRING`) — uninterpolated topic template from the call site
- `match` (`STRING`) — cross_service|intra_service|ambiguous|phantom|unresolved

**Typical traversals**:

- `type_subject_current`: neighbors(['{id}'],'out',['DECLARES']) then neighbors(member_ids,'out',['ASYNC_CALLS'])
- `type_subject`: neighbors(['{id}'],'out',['DECLARES']) then neighbors(member_ids,'out',['DECLARES_PRODUCER']) then neighbors(producer_ids,'out',['ASYNC_CALLS'])
- `member_subject_current`: neighbors(['{id}'],'out',['ASYNC_CALLS'])
- `member_subject`: neighbors(['{id}'],'out',['DECLARES_PRODUCER']) then neighbors(producer_ids,'out',['ASYNC_CALLS'])
- `alien_subject`: ASYNC_CALLS is Symbol→Route until PR-C; use member_subject_current. After PR-C (Producer→Route), use member_subject via DECLARES_PRODUCER
Loading
Loading