diff --git a/.cursor/rules/project-overview.mdc b/.cursor/rules/project-overview.mdc index ff1d9e1..2cfee11 100644 --- a/.cursor/rules/project-overview.mdc +++ b/.cursor/rules/project-overview.mdc @@ -22,7 +22,7 @@ when needed. MCP tools (`search` / `find` / `describe` / `neighbors` / `resolve`; response `hints` + pagination echo on locate tools — see README), `java-codebase-rag` CLI, "Re-index required" callouts. The current - `ontology_version` is **13** (material `OVERRIDES` Symbol→Symbol edges: subtype + `ontology_version` is **14** (`EDGE_SCHEMA` in `java_ontology.py`; material `OVERRIDES` Symbol→Symbol edges: subtype instance method → supertype declaration with matching `signature`, one `IMPLEMENTS`/`EXTENDS` hop; valid `neighbors` `EdgeType`). Builds on v12 (`@CodebaseHttpClient` rename + shared `CodebaseHttpMethod` enum; inbound diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index a09ff66..c6a1a06 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -50,6 +50,9 @@ jobs: python -m pip install --upgrade pip pip install -r requirements.txt pip install -e . + - name: Check generated edge navigation doc + if: steps.changes.outputs.code == 'true' + run: python scripts/generate_edge_navigation.py --check - name: Run tests if: steps.changes.outputs.code == 'true' env: diff --git a/AGENTS.md b/AGENTS.md index d27f47e..be877c6 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -11,7 +11,7 @@ for tools that don't read `.cursor/rules/`. MCP tool list (`search` / `find` / `describe` / `neighbors` / `resolve`; response `hints` + pagination echo — see README), CLI ops (`java-codebase-rag --help`), and "Re-index required" callouts. - **`ontology_version` is currently 13** (stored `OVERRIDES` method→method edges traversable via `neighbors`; plus v12 HTTP brownfield rename, `CodebaseHttpMethod` enum, inbound HTTP layer-C replace — see README graph section). + **`ontology_version` is currently 14** (`EDGE_SCHEMA` in `java_ontology.py`; v14 re-index required; HTTP/ASYNC caller-side endpoint flips ship in SCHEMA-V2 PR-B/C — see README graph section and `docs/EDGE-NAVIGATION.md`). - [`docs/JAVA-CODEBASE-RAG-CLI.md`](./docs/JAVA-CODEBASE-RAG-CLI.md) — operator guide for the `java-codebase-rag` CLI (`init` / `increment` / `reprocess` / `erase`, `meta`, `tables`, `diagnose-ignore`, `analyze-pr`; hidden `refresh` alias → `reprocess` — see that doc). - `CODEBASE_REQUIREMENTS.md` — Java-repo assumptions and tuning map. - **`propose/`** — design proposes. **In-flight** work is **`propose/*.md`** diff --git a/CODEBASE_REQUIREMENTS.md b/CODEBASE_REQUIREMENTS.md index 89641eb..7bfe98f 100644 --- a/CODEBASE_REQUIREMENTS.md +++ b/CODEBASE_REQUIREMENTS.md @@ -187,7 +187,7 @@ root (`role_overrides:`, `route_overrides:`, `http_client_overrides:`, **MCP discovery:** after indexing, use MCP `find` with `kind="route"` for inbound HTTP and async routes and `kind="client"` for outbound HTTP `Client` declarations (Feign methods plus annotated imperative clients). Client rows -require a graph built with `ontology_version` **13** or newer — confirm with +require a graph built with `ontology_version` **14** or newer — confirm with `java-codebase-rag meta` (JSON field `ontology_version`). See **Brownfield overrides** in `README.md` for the full schema, usage diff --git a/README.md b/README.md index 81ea9b0..9d32915 100644 --- a/README.md +++ b/README.md @@ -229,7 +229,7 @@ Edit `claude_desktop_config.json` (macOS: `~/Library/Application Support/Claude/ ### Driving the MCP from an agent -- **[`docs/AGENT-GUIDE.md`](./docs/AGENT-GUIDE.md)** — copy-paste into `QWEN.md` / `CLAUDE.md` / `AGENTS.md`. Covers the five MCP tools, the shared `NodeFilter`, the edge-type taxonomy, required `neighbors` arguments, the ontology glossary (currently **v13**), the recovery playbook, and slash-style aliases. +- **[`docs/AGENT-GUIDE.md`](./docs/AGENT-GUIDE.md)** — copy-paste into `QWEN.md` / `CLAUDE.md` / `AGENTS.md`. Covers the five MCP tools, the shared `NodeFilter`, the edge-type taxonomy, required `neighbors` arguments, the ontology glossary (currently **v14**), the recovery playbook, and slash-style aliases. - **[`docs/skills/java-codebase-explore.md`](./docs/skills/java-codebase-explore.md)** — exploration **strategy** (missions, fallbacks, anti-capabilities, stopping rules); AGENT-GUIDE remains the **operating manual** for tool shapes and recovery. - **[`docs/MANUAL-VERIFICATION-CHECKLIST.md`](./docs/MANUAL-VERIFICATION-CHECKLIST.md)** — 7-phase agent-driven verification you run after indexing your real project. Each item has a copy-paste prompt and calibration data from `tests/bank-chat-system`. - **[`automation/cursor_propose_only/README.md`](./automation/cursor_propose_only/README.md)** — optional proposal orchestration workflow (single-command autopilot, planning bundles, and automated execution/review loops). @@ -361,7 +361,7 @@ For `reprocess`, the pipeline runs `cocoindex` with `cwd` set to the bundle dire ## 6. Graph layer -A deterministic property graph derived from tree-sitter Java parsing lives next to the LanceDB tables under the index directory (default `${JAVA_CODEBASE_RAG_INDEX_DIR:-./.java-codebase-rag}/code_graph.kuzu`). Current ontology version: **13**. +A deterministic property graph derived from tree-sitter Java parsing lives next to the LanceDB tables under the index directory (default `${JAVA_CODEBASE_RAG_INDEX_DIR:-./.java-codebase-rag}/code_graph.kuzu`). Current ontology version: **14** (see [`docs/EDGE-NAVIGATION.md`](./docs/EDGE-NAVIGATION.md) for edge shapes). ### Node kinds @@ -424,7 +424,9 @@ Resolution order for `microservice`: ### Re-index required when ontology changes -Current ontology version is **13**. Any index built before this version must be rebuilt via `cocoindex update ... --full-reprocess -f` or a full `java-codebase-rag reprocess` (no selective flags) so vectors and graph stay aligned. Until re-indexed, the server defensively JSON-decodes string-form list columns so nothing explodes, but filters like `array_contains` will not work. +Current ontology version is **14**. Any index built before this version must be rebuilt via `cocoindex update ... --full-reprocess -f` or a full `java-codebase-rag reprocess` (no selective flags) so vectors and graph stay aligned. Until re-indexed, the server defensively JSON-decodes string-form list columns so nothing explodes, but filters like `array_contains` will not work. + +Ontology **14** introduces `EDGE_SCHEMA` in `java_ontology.py` as the canonical edge navigation schema (see `docs/EDGE-NAVIGATION.md`). **This PR-A bump alone does not flip `HTTP_CALLS` / `ASYNC_CALLS` endpoints** — graphs rebuilt at v14 still use `Symbol → Route` for those edges until SCHEMA-V2 PR-B/C land. **PR-B** flips `HTTP_CALLS` to `Client → Route`; **PR-C** adds the `Producer` node, `DECLARES_PRODUCER`, and flips `ASYNC_CALLS` to `Producer → Route`. Run one full reprocess after upgrading through the SCHEMA-V2 sequence (or when you need the v14 ontology gate). Ontology **13** materializes stored `OVERRIDES` edges between method Symbols (subtype override → supertype declaration, matching `signature` on a direct `IMPLEMENTS` / `EXTENDS` hop). `neighbors(edge_types=["OVERRIDES"])` traverses this relationship; `OVERRIDDEN_BY*` keys in `edge_summary` remain describe-time rollups only. diff --git a/ast_java.py b/ast_java.py index 225120e..8b83a89 100644 --- a/ast_java.py +++ b/ast_java.py @@ -80,8 +80,9 @@ # Phase 8: first-class Client node + DECLARES_CLIENT relation, separating outbound declarations from Route. # Phase 9: `@CodebaseAsyncRoute` replaces same-method built-in `@KafkaListener` routes in graph composition. # Phase 10: `@CodebaseHttpClient` rename + `CodebaseHttpMethod` enum; inbound HTTP layer-C replaces built-in rows. +# Phase 11: `EDGE_SCHEMA` in `java_ontology.py` (canonical edge navigation schema; v14 re-index). # Bumps whenever extraction / enrichment semantics change. -ONTOLOGY_VERSION = 13 +ONTOLOGY_VERSION = 14 ROLE_ANNOTATIONS: dict[str, str] = { # Spring Web diff --git a/docs/AGENT-GUIDE.md b/docs/AGENT-GUIDE.md index 6244f7d..01fe134 100644 --- a/docs/AGENT-GUIDE.md +++ b/docs/AGENT-GUIDE.md @@ -12,10 +12,11 @@ > `neighbors` arguments, pass stringified JSON, or use vector search for > questions the graph answers exactly. This guide keeps them on the rails. > -> Calibrated against ontology version **13** (see `ast_java.ONTOLOGY_VERSION` / -> `java_ontology.py` valid sets): stored `OVERRIDES` Symbol→Symbol edges (subtype -> override → supertype declaration, matching `signature`, one `IMPLEMENTS`/`EXTENDS` -> hop) and `neighbors(edge_types=["OVERRIDES"])`. Still includes v12 HTTP brownfield +> Calibrated against ontology version **14** (see `ast_java.ONTOLOGY_VERSION` / +> `java_ontology.EDGE_SCHEMA` + valid sets): canonical edge navigation schema in +> `docs/EDGE-NAVIGATION.md`. v14 re-index required; PR-B flips `HTTP_CALLS` to +> `Client → Route`; PR-C adds `Producer` + `DECLARES_PRODUCER` and flips `ASYNC_CALLS`. +> Still includes stored `OVERRIDES` Symbol→Symbol edges and v12 HTTP brownfield > (`@CodebaseHttpClient`, shared `CodebaseHttpMethod` enum, inbound layer-C HTTP routes > replace same-method built-in rows). **Design rationale:** navigation surface and tools — > [`propose/completed/MCP-API-V2-REDESIGN-PROPOSE.md`](../propose/completed/MCP-API-V2-REDESIGN-PROPOSE.md); @@ -260,9 +261,9 @@ Virtual keys (`OVERRIDDEN_BY`, …) and composed dot-keys are **not** valid `Edg - **Batching:** Multiple origins are expanded; pagination slices the **combined** edge list — use larger `limit` when batching many ids. - **Confidence:** Cross-service edges (`HTTP_CALLS`, `ASYNC_CALLS`) carry confidence, strategy, and match metadata on `edge.attrs` (`attrs.confidence`, `attrs.strategy`, `attrs.match`). Low confidence means the resolver had to guess at the route binding — treat it as a **resolver gap signal**, not a hallucination. Report low-confidence edges with their confidence value, not as facts. Intra-service edges (`CALLS`, `INJECTS`, `IMPLEMENTS`, `EXTENDS`, `DECLARES`, `DECLARES_CLIENT`, `EXPOSES`, `OVERRIDES`) faithfully represent the static graph; the resolved set is still a **lower bound** under reflection / dynamic dispatch (see *What this MCP is NOT*). -### Ontology glossary (version 13) +### Ontology glossary (version 14) -Source of truth: `java_ontology.py`. Strings are case-sensitive. +Source of truth: `java_ontology.py` (`EDGE_SCHEMA`, valid sets). Strings are case-sensitive. Edge navigation: [`docs/EDGE-NAVIGATION.md`](./EDGE-NAVIGATION.md) — use `*_current` traversal keys for `HTTP_CALLS` / `ASYNC_CALLS` until SCHEMA-V2 PR-B/C flip endpoints. **Roles:** `CONTROLLER`, `SERVICE`, `REPOSITORY`, `COMPONENT`, `CONFIG`, `ENTITY`, `CLIENT`, `MAPPER`, `DTO`, `OTHER`. diff --git a/docs/EDGE-NAVIGATION.md b/docs/EDGE-NAVIGATION.md new file mode 100644 index 0000000..0dfabdc --- /dev/null +++ b/docs/EDGE-NAVIGATION.md @@ -0,0 +1,234 @@ +# Edge Navigation Schema + +> **Generated from `java_ontology.EDGE_SCHEMA` — do not edit by hand.** +> Regenerate: `.venv/bin/python scripts/generate_edge_navigation.py` + +## Summary + +| Edge | From | To | Cardinality | Brownfield-resolver-sourced | Member-only | +| --- | --- | --- | --- | --- | --- | +| EXTENDS | Symbol | Symbol | many_to_one | no | no | +| IMPLEMENTS | Symbol | Symbol | many_to_many | no | no | +| INJECTS | Symbol | Symbol | many_to_many | no | no | +| DECLARES | Symbol | Symbol | one_to_many | no | no | +| OVERRIDES | Symbol | Symbol | many_to_one | no | yes | +| CALLS | Symbol | Symbol | many_to_many | yes | yes | +| EXPOSES | Symbol | Route | one_to_one | yes | yes | +| DECLARES_CLIENT | Symbol | Client | one_to_many | yes | yes | +| HTTP_CALLS | Symbol | Route | many_to_many | yes | no | +| ASYNC_CALLS | Symbol | Route | many_to_many | yes | no | + +## EXTENDS + +**Endpoints**: `Symbol → Symbol` +**Cardinality**: `many_to_one` +**Brownfield-resolver-sourced**: no +**Member-only** (hints): no + +**Purpose**: class or interface direct supertype relation + +**Attributes**: + +- `dst_name` (`STRING`) — raw supertype name as written in source +- `dst_fqn` (`STRING`) — best-effort resolved FQN of the supertype +- `resolved` (`BOOLEAN`) — True iff dst_fqn was resolved to an in-graph Symbol + +**Typical traversals**: + +- `type_subject`: neighbors(['{id}'],'out',['DECLARES']) then neighbors(member_ids,'{direction}',['EXTENDS']) +- `member_subject`: neighbors(['{id}'],'out',['EXTENDS']) +- `alien_subject`: EXTENDS connects Symbol → Symbol; use a type or member Symbol id + +## IMPLEMENTS + +**Endpoints**: `Symbol → Symbol` +**Cardinality**: `many_to_many` +**Brownfield-resolver-sourced**: no +**Member-only** (hints): no + +**Purpose**: class implements interface relation + +**Attributes**: + +- `dst_name` (`STRING`) — raw interface name as written in source +- `dst_fqn` (`STRING`) — best-effort resolved FQN of the interface +- `resolved` (`BOOLEAN`) — True iff dst_fqn was resolved to an in-graph Symbol + +**Typical traversals**: + +- `type_subject`: neighbors(['{id}'],'out',['DECLARES']) then neighbors(member_ids,'{direction}',['IMPLEMENTS']) +- `member_subject`: neighbors(['{id}'],'out',['IMPLEMENTS']) +- `alien_subject`: IMPLEMENTS connects Symbol → Symbol; use a type or member Symbol id + +## INJECTS + +**Endpoints**: `Symbol → Symbol` +**Cardinality**: `many_to_many` +**Brownfield-resolver-sourced**: no +**Member-only** (hints): no + +**Purpose**: dependency injection edge from declaring type to injected type + +**Attributes**: + +- `dst_name` (`STRING`) — raw injected type name as written in source +- `dst_fqn` (`STRING`) — best-effort resolved FQN of the injected type +- `resolved` (`BOOLEAN`) — True iff dst_fqn was resolved to an in-graph Symbol +- `mechanism` (`STRING`) — injection mechanism literal (constructor, field, setter, …) +- `annotation` (`STRING`) — injection annotation simple name when present +- `field_or_param` (`STRING`) — field or parameter name for the injection site + +**Typical traversals**: + +- `type_subject`: neighbors(['{id}'],'out',['DECLARES']) then neighbors(member_ids,'{direction}',['INJECTS']) +- `member_subject`: neighbors(['{id}'],'in',['INJECTS']) +- `alien_subject`: INJECTS connects Symbol → Symbol; use a type Symbol id + +## DECLARES + +**Endpoints**: `Symbol → Symbol` +**Cardinality**: `one_to_many` +**Brownfield-resolver-sourced**: no +**Member-only** (hints): no + +**Purpose**: type declares member Symbol (method, constructor, nested type) + +**Attributes**: _(none)_ + +**Typical traversals**: + +- `type_subject`: neighbors(['{id}'],'out',['DECLARES']) +- `member_subject`: neighbors(['{id}'],'in',['DECLARES']) +- `alien_subject`: DECLARES connects Symbol → Symbol; use a type Symbol id for outbound members + +## OVERRIDES + +**Endpoints**: `Symbol → Symbol` +**Cardinality**: `many_to_one` +**Brownfield-resolver-sourced**: no +**Member-only** (hints): yes + +**Purpose**: subtype method overrides supertype declared method with matching signature + +**Attributes**: _(none)_ + +**Typical traversals**: + +- `type_subject`: neighbors(['{id}'],'out',['DECLARES']) then neighbors(member_ids,'{direction}',['OVERRIDES']) +- `member_subject`: neighbors(['{id}'],'out',['OVERRIDES']) +- `alien_subject`: OVERRIDES connects method Symbol → method Symbol + +## CALLS + +**Endpoints**: `Symbol → Symbol` +**Cardinality**: `many_to_many` +**Brownfield-resolver-sourced**: yes +**Member-only** (hints): yes + +**Purpose**: intra-codebase method call from caller method to callee method + +**Attributes**: + +- `call_site_line` (`INT64`) — source line of the call site +- `call_site_byte` (`INT64`) — source byte offset of the call site +- `arg_count` (`INT64`) — argument count at the call site (-1 for method references) +- `confidence` (`DOUBLE`) — resolver confidence in [0.0, 1.0] +- `strategy` (`STRING`) — call-graph resolution strategy literal +- `source` (`STRING`) — call-graph source tag +- `resolved` (`BOOLEAN`) — True iff callee Symbol was resolved in-graph + +**Typical traversals**: + +- `type_subject`: neighbors(['{id}'],'out',['DECLARES']) then neighbors(member_ids,'{direction}',['CALLS']) +- `member_subject`: neighbors(['{id}'],'out',['CALLS']) +- `alien_subject`: CALLS connects method Symbol → method Symbol + +## EXPOSES + +**Endpoints**: `Symbol → Route` +**Cardinality**: `one_to_one` +**Brownfield-resolver-sourced**: yes +**Member-only** (hints): yes + +**Purpose**: declaring method exposes an inbound HTTP or messaging Route + +**Attributes**: + +- `confidence` (`DOUBLE`) — route extraction confidence in [0.0, 1.0] +- `strategy` (`STRING`) — route resolution strategy literal + +**Typical traversals**: + +- `type_subject`: neighbors(['{id}'],'out',['DECLARES']) then neighbors(member_ids,'{direction}',['EXPOSES']) +- `member_subject`: neighbors(['{id}'],'out',['EXPOSES']) +- `alien_subject`: EXPOSES connects method Symbol → Route; use a method Symbol id + +## DECLARES_CLIENT + +**Endpoints**: `Symbol → Client` +**Cardinality**: `one_to_many` +**Brownfield-resolver-sourced**: yes +**Member-only** (hints): yes + +**Purpose**: method declares an outbound HTTP client call site + +**Attributes**: + +- `confidence` (`DOUBLE`) — client declaration confidence in [0.0, 1.0] +- `strategy` (`STRING`) — client resolution strategy literal + +**Typical traversals**: + +- `type_subject`: neighbors(['{id}'],'out',['DECLARES']) then neighbors(member_ids,'{direction}',['DECLARES_CLIENT']) +- `member_subject`: neighbors(['{id}'],'out',['DECLARES_CLIENT']) +- `alien_subject`: DECLARES_CLIENT connects method Symbol → Client + +## HTTP_CALLS + +**Endpoints**: `Symbol → Route` +**Cardinality**: `many_to_many` +**Brownfield-resolver-sourced**: yes +**Member-only** (hints): no + +**Purpose**: resolved HTTP call from declaring method to target route (pre-flip: Symbol→Route; PR-B: Client→Route) + +**Attributes**: + +- `confidence` (`DOUBLE`) — pass6 match confidence in [0.0, 1.0] +- `strategy` (`STRING`) — HTTP call resolution strategy literal +- `method_call` (`STRING`) — HTTP method of the call site +- `raw_uri` (`STRING`) — uninterpolated URI template from the call site +- `match` (`STRING`) — cross_service|intra_service|ambiguous|phantom|unresolved + +**Typical traversals**: + +- `type_subject_current`: neighbors(['{id}'],'out',['DECLARES']) then neighbors(member_ids,'out',['HTTP_CALLS']) +- `type_subject`: neighbors(['{id}'],'out',['DECLARES']) then neighbors(member_ids,'out',['DECLARES_CLIENT']) then neighbors(client_ids,'out',['HTTP_CALLS']) +- `member_subject_current`: neighbors(['{id}'],'out',['HTTP_CALLS']) +- `member_subject`: neighbors(['{id}'],'out',['DECLARES_CLIENT']) then neighbors(client_ids,'out',['HTTP_CALLS']) +- `alien_subject`: HTTP_CALLS is Symbol→Route until PR-B; use member_subject_current. After PR-B (Client→Route), use member_subject via DECLARES_CLIENT + +## ASYNC_CALLS + +**Endpoints**: `Symbol → Route` +**Cardinality**: `many_to_many` +**Brownfield-resolver-sourced**: yes +**Member-only** (hints): no + +**Purpose**: resolved async call from declaring method to topic route (pre-flip: Symbol→Route; PR-C: Producer→Route) + +**Attributes**: + +- `confidence` (`DOUBLE`) — pass6 match confidence in [0.0, 1.0] +- `strategy` (`STRING`) — async call resolution strategy literal +- `direction` (`STRING`) — produce|consume async direction literal +- `raw_topic` (`STRING`) — uninterpolated topic template from the call site +- `match` (`STRING`) — cross_service|intra_service|ambiguous|phantom|unresolved + +**Typical traversals**: + +- `type_subject_current`: neighbors(['{id}'],'out',['DECLARES']) then neighbors(member_ids,'out',['ASYNC_CALLS']) +- `type_subject`: neighbors(['{id}'],'out',['DECLARES']) then neighbors(member_ids,'out',['DECLARES_PRODUCER']) then neighbors(producer_ids,'out',['ASYNC_CALLS']) +- `member_subject_current`: neighbors(['{id}'],'out',['ASYNC_CALLS']) +- `member_subject`: neighbors(['{id}'],'out',['DECLARES_PRODUCER']) then neighbors(producer_ids,'out',['ASYNC_CALLS']) +- `alien_subject`: ASYNC_CALLS is Symbol→Route until PR-C; use member_subject_current. After PR-C (Producer→Route), use member_subject via DECLARES_PRODUCER diff --git a/docs/skills/java-codebase-explore.md b/docs/skills/java-codebase-explore.md index f55eaec..5df33d5 100644 --- a/docs/skills/java-codebase-explore.md +++ b/docs/skills/java-codebase-explore.md @@ -205,7 +205,7 @@ disagreement as evidence of staleness, not as a contradiction. - **Wire fields:** Cross-service and resolver-heavy edges carry **`edge.attrs`** (same map surfaced as `attrs` in payloads). Treat **`attrs.confidence`**, **`attrs.strategy`**, and **`attrs.match`** as structured hints: low confidence means “resolver could not pin this cleanly,” not “definitely false.” - **MCP vs editor:** If the open buffer contradicts graph edges (deleted method, renamed class), **trust the file** and treat MCP as stale until **`reprocess`** (or at least acknowledge incremental lag after **`increment`**). -- **Operational check:** Use **`java-codebase-rag meta`** to compare index health, ontology version (currently **13** in this repo’s README), and recency signals—then decide whether to re-run **`reprocess`** before continuing a mission. +- **Operational check:** Use **`java-codebase-rag meta`** to compare index health, ontology version (currently **14** in this repo’s README), and recency signals—then decide whether to re-run **`reprocess`** before continuing a mission. ## Anti-patterns diff --git a/java_ontology.py b/java_ontology.py index f3ed198..9b55b54 100644 --- a/java_ontology.py +++ b/java_ontology.py @@ -4,6 +4,7 @@ and resolver steps for `@CodebaseRole` / `@CodebaseCapability`.""" from __future__ import annotations +from dataclasses import dataclass from typing import Literal from ast_java import ( @@ -95,6 +96,283 @@ "implicit_super", }) +# Union of fuzzy + non-fuzzy resolver strategies that may appear on graph edges +# carrying a `strategy` column (brownfield layers, codebase stubs, call-graph tiers, +# HTTP/async dispatch literals). Used by `EdgeSpec.brownfield_resolver_sourced`. +BROWNFIELD_RESOLVER_STRATEGY_SET: frozenset[str] = frozenset({ + *FUZZY_STRATEGY_SET, + "layer_b_ann", + "layer_a_meta", + "codebase_route", + "codebase_client", + "codebase_producer", + "annotation", + "spel", + "constant_ref", + *VALID_HTTP_CALL_STRATEGIES, + *VALID_ASYNC_CALL_STRATEGIES, + *VALID_CLIENT_KINDS, + *VALID_PRODUCER_KINDS, + "import_map", + "static_import", + "static_import_wildcard", + "constructor", + "method_reference", + "this_super", + "unique_type_name", + "suffix", + "same_module", +}) + +NodeKind = Literal["Symbol", "Route", "Client", "Producer"] +Cardinality = Literal["many_to_many", "many_to_one", "one_to_many", "one_to_one"] + + +@dataclass(frozen=True) +class EdgeAttr: + name: str + kuzu_type: str + purpose: str + + +@dataclass(frozen=True) +class EdgeSpec: + name: str + src: NodeKind + dst: NodeKind + cardinality: Cardinality + brownfield_resolver_sourced: bool + attrs: tuple[EdgeAttr, ...] + purpose: str + typical_traversals: dict[str, str] + member_only: bool = False + + +_SYMBOL_TYPE_TRAVERSAL = ( + "neighbors(['{id}'],'out',['DECLARES']) " + "then neighbors(member_ids,'{direction}',['{edge}'])" +) + +EDGE_SCHEMA: dict[str, EdgeSpec] = { + "EXTENDS": EdgeSpec( + name="EXTENDS", + src="Symbol", + dst="Symbol", + cardinality="many_to_one", + brownfield_resolver_sourced=False, + attrs=( + EdgeAttr("dst_name", "STRING", "raw supertype name as written in source"), + EdgeAttr("dst_fqn", "STRING", "best-effort resolved FQN of the supertype"), + EdgeAttr("resolved", "BOOLEAN", "True iff dst_fqn was resolved to an in-graph Symbol"), + ), + purpose="class or interface direct supertype relation", + typical_traversals={ + "type_subject": _SYMBOL_TYPE_TRAVERSAL.format(id="{id}", direction="{direction}", edge="EXTENDS"), + "member_subject": "neighbors(['{id}'],'out',['EXTENDS'])", + "alien_subject": "EXTENDS connects Symbol → Symbol; use a type or member Symbol id", + }, + ), + "IMPLEMENTS": EdgeSpec( + name="IMPLEMENTS", + src="Symbol", + dst="Symbol", + cardinality="many_to_many", + brownfield_resolver_sourced=False, + attrs=( + EdgeAttr("dst_name", "STRING", "raw interface name as written in source"), + EdgeAttr("dst_fqn", "STRING", "best-effort resolved FQN of the interface"), + EdgeAttr("resolved", "BOOLEAN", "True iff dst_fqn was resolved to an in-graph Symbol"), + ), + purpose="class implements interface relation", + typical_traversals={ + "type_subject": _SYMBOL_TYPE_TRAVERSAL.format(id="{id}", direction="{direction}", edge="IMPLEMENTS"), + "member_subject": "neighbors(['{id}'],'out',['IMPLEMENTS'])", + "alien_subject": "IMPLEMENTS connects Symbol → Symbol; use a type or member Symbol id", + }, + ), + "INJECTS": EdgeSpec( + name="INJECTS", + src="Symbol", + dst="Symbol", + cardinality="many_to_many", + brownfield_resolver_sourced=False, + attrs=( + EdgeAttr("dst_name", "STRING", "raw injected type name as written in source"), + EdgeAttr("dst_fqn", "STRING", "best-effort resolved FQN of the injected type"), + EdgeAttr("resolved", "BOOLEAN", "True iff dst_fqn was resolved to an in-graph Symbol"), + EdgeAttr("mechanism", "STRING", "injection mechanism literal (constructor, field, setter, …)"), + EdgeAttr("annotation", "STRING", "injection annotation simple name when present"), + EdgeAttr("field_or_param", "STRING", "field or parameter name for the injection site"), + ), + purpose="dependency injection edge from declaring type to injected type", + typical_traversals={ + "type_subject": _SYMBOL_TYPE_TRAVERSAL.format(id="{id}", direction="{direction}", edge="INJECTS"), + "member_subject": "neighbors(['{id}'],'in',['INJECTS'])", + "alien_subject": "INJECTS connects Symbol → Symbol; use a type Symbol id", + }, + ), + "DECLARES": EdgeSpec( + name="DECLARES", + src="Symbol", + dst="Symbol", + cardinality="one_to_many", + brownfield_resolver_sourced=False, + attrs=(), + purpose="type declares member Symbol (method, constructor, nested type)", + typical_traversals={ + "type_subject": "neighbors(['{id}'],'out',['DECLARES'])", + "member_subject": "neighbors(['{id}'],'in',['DECLARES'])", + "alien_subject": "DECLARES connects Symbol → Symbol; use a type Symbol id for outbound members", + }, + ), + "OVERRIDES": EdgeSpec( + name="OVERRIDES", + src="Symbol", + dst="Symbol", + cardinality="many_to_one", + brownfield_resolver_sourced=False, + attrs=(), + purpose="subtype method overrides supertype declared method with matching signature", + member_only=True, + typical_traversals={ + "type_subject": _SYMBOL_TYPE_TRAVERSAL.format(id="{id}", direction="{direction}", edge="OVERRIDES"), + "member_subject": "neighbors(['{id}'],'out',['OVERRIDES'])", + "alien_subject": "OVERRIDES connects method Symbol → method Symbol", + }, + ), + "CALLS": EdgeSpec( + name="CALLS", + src="Symbol", + dst="Symbol", + cardinality="many_to_many", + brownfield_resolver_sourced=True, + attrs=( + EdgeAttr("call_site_line", "INT64", "source line of the call site"), + EdgeAttr("call_site_byte", "INT64", "source byte offset of the call site"), + EdgeAttr("arg_count", "INT64", "argument count at the call site (-1 for method references)"), + EdgeAttr("confidence", "DOUBLE", "resolver confidence in [0.0, 1.0]"), + EdgeAttr("strategy", "STRING", "call-graph resolution strategy literal"), + EdgeAttr("source", "STRING", "call-graph source tag"), + EdgeAttr("resolved", "BOOLEAN", "True iff callee Symbol was resolved in-graph"), + ), + purpose="intra-codebase method call from caller method to callee method", + member_only=True, + typical_traversals={ + "type_subject": _SYMBOL_TYPE_TRAVERSAL.format(id="{id}", direction="{direction}", edge="CALLS"), + "member_subject": "neighbors(['{id}'],'out',['CALLS'])", + "alien_subject": "CALLS connects method Symbol → method Symbol", + }, + ), + "EXPOSES": EdgeSpec( + name="EXPOSES", + src="Symbol", + dst="Route", + cardinality="one_to_one", + brownfield_resolver_sourced=True, + attrs=( + EdgeAttr("confidence", "DOUBLE", "route extraction confidence in [0.0, 1.0]"), + EdgeAttr("strategy", "STRING", "route resolution strategy literal"), + ), + purpose="declaring method exposes an inbound HTTP or messaging Route", + member_only=True, + typical_traversals={ + "type_subject": _SYMBOL_TYPE_TRAVERSAL.format(id="{id}", direction="{direction}", edge="EXPOSES"), + "member_subject": "neighbors(['{id}'],'out',['EXPOSES'])", + "alien_subject": "EXPOSES connects method Symbol → Route; use a method Symbol id", + }, + ), + "DECLARES_CLIENT": EdgeSpec( + name="DECLARES_CLIENT", + src="Symbol", + dst="Client", + cardinality="one_to_many", + brownfield_resolver_sourced=True, + attrs=( + EdgeAttr("confidence", "DOUBLE", "client declaration confidence in [0.0, 1.0]"), + EdgeAttr("strategy", "STRING", "client resolution strategy literal"), + ), + purpose="method declares an outbound HTTP client call site", + member_only=True, + typical_traversals={ + "type_subject": _SYMBOL_TYPE_TRAVERSAL.format( + id="{id}", direction="{direction}", edge="DECLARES_CLIENT", + ), + "member_subject": "neighbors(['{id}'],'out',['DECLARES_CLIENT'])", + "alien_subject": "DECLARES_CLIENT connects method Symbol → Client", + }, + ), + "HTTP_CALLS": EdgeSpec( + name="HTTP_CALLS", + src="Symbol", + dst="Route", + cardinality="many_to_many", + brownfield_resolver_sourced=True, + attrs=( + EdgeAttr("confidence", "DOUBLE", "pass6 match confidence in [0.0, 1.0]"), + EdgeAttr("strategy", "STRING", "HTTP call resolution strategy literal"), + EdgeAttr("method_call", "STRING", "HTTP method of the call site"), + EdgeAttr("raw_uri", "STRING", "uninterpolated URI template from the call site"), + EdgeAttr("match", "STRING", "cross_service|intra_service|ambiguous|phantom|unresolved"), + ), + purpose="resolved HTTP call from declaring method to target route (pre-flip: Symbol→Route; PR-B: Client→Route)", + typical_traversals={ + "type_subject_current": ( + "neighbors(['{id}'],'out',['DECLARES']) " + "then neighbors(member_ids,'out',['HTTP_CALLS'])" + ), + "type_subject": ( + "neighbors(['{id}'],'out',['DECLARES']) " + "then neighbors(member_ids,'out',['DECLARES_CLIENT']) " + "then neighbors(client_ids,'out',['HTTP_CALLS'])" + ), + "member_subject_current": "neighbors(['{id}'],'out',['HTTP_CALLS'])", + "member_subject": ( + "neighbors(['{id}'],'out',['DECLARES_CLIENT']) " + "then neighbors(client_ids,'out',['HTTP_CALLS'])" + ), + "alien_subject": ( + "HTTP_CALLS is Symbol→Route until PR-B; use member_subject_current. " + "After PR-B (Client→Route), use member_subject via DECLARES_CLIENT" + ), + }, + ), + "ASYNC_CALLS": EdgeSpec( + name="ASYNC_CALLS", + src="Symbol", + dst="Route", + cardinality="many_to_many", + brownfield_resolver_sourced=True, + attrs=( + EdgeAttr("confidence", "DOUBLE", "pass6 match confidence in [0.0, 1.0]"), + EdgeAttr("strategy", "STRING", "async call resolution strategy literal"), + EdgeAttr("direction", "STRING", "produce|consume async direction literal"), + EdgeAttr("raw_topic", "STRING", "uninterpolated topic template from the call site"), + EdgeAttr("match", "STRING", "cross_service|intra_service|ambiguous|phantom|unresolved"), + ), + purpose="resolved async call from declaring method to topic route (pre-flip: Symbol→Route; PR-C: Producer→Route)", + typical_traversals={ + "type_subject_current": ( + "neighbors(['{id}'],'out',['DECLARES']) " + "then neighbors(member_ids,'out',['ASYNC_CALLS'])" + ), + "type_subject": ( + "neighbors(['{id}'],'out',['DECLARES']) " + "then neighbors(member_ids,'out',['DECLARES_PRODUCER']) " + "then neighbors(producer_ids,'out',['ASYNC_CALLS'])" + ), + "member_subject_current": "neighbors(['{id}'],'out',['ASYNC_CALLS'])", + "member_subject": ( + "neighbors(['{id}'],'out',['DECLARES_PRODUCER']) " + "then neighbors(producer_ids,'out',['ASYNC_CALLS'])" + ), + "alien_subject": ( + "ASYNC_CALLS is Symbol→Route until PR-C; use member_subject_current. " + "After PR-C (Producer→Route), use member_subject via DECLARES_PRODUCER" + ), + }, + ), +} + ResolveReason = Literal[ "exact_id", "exact_fqn", @@ -118,5 +396,11 @@ "VALID_HTTP_CALL_MATCHES", "VALID_RESOLVE_REASONS", "FUZZY_STRATEGY_SET", + "BROWNFIELD_RESOLVER_STRATEGY_SET", + "NodeKind", + "Cardinality", + "EdgeAttr", + "EdgeSpec", + "EDGE_SCHEMA", "ResolveReason", ] diff --git a/scripts/generate_edge_navigation.py b/scripts/generate_edge_navigation.py new file mode 100644 index 0000000..a9f3965 --- /dev/null +++ b/scripts/generate_edge_navigation.py @@ -0,0 +1,100 @@ +#!/usr/bin/env python3 +"""Generate docs/EDGE-NAVIGATION.md from java_ontology.EDGE_SCHEMA.""" +from __future__ import annotations + +import argparse +import sys +from pathlib import Path + +_REPO_ROOT = Path(__file__).resolve().parent.parent +if str(_REPO_ROOT) not in sys.path: + sys.path.insert(0, str(_REPO_ROOT)) + +from java_ontology import EDGE_SCHEMA, EdgeSpec # noqa: E402 + +_DEFAULT_OUT = _REPO_ROOT / "docs" / "EDGE-NAVIGATION.md" +_BANNER = ( + "# Edge Navigation Schema\n\n" + "> **Generated from `java_ontology.EDGE_SCHEMA` — do not edit by hand.**\n" + "> Regenerate: `.venv/bin/python scripts/generate_edge_navigation.py`\n" +) + + +def _yes_no(flag: bool) -> str: + return "yes" if flag else "no" + + +def _render_edge(spec: EdgeSpec) -> list[str]: + lines = [ + f"## {spec.name}", + "", + f"**Endpoints**: `{spec.src} → {spec.dst}`", + f"**Cardinality**: `{spec.cardinality}`", + f"**Brownfield-resolver-sourced**: {_yes_no(spec.brownfield_resolver_sourced)}", + f"**Member-only** (hints): {_yes_no(spec.member_only)}", + "", + f"**Purpose**: {spec.purpose}", + "", + ] + if spec.attrs: + lines.append("**Attributes**:") + lines.append("") + for attr in spec.attrs: + lines.append(f"- `{attr.name}` (`{attr.kuzu_type}`) — {attr.purpose}") + lines.append("") + else: + lines.append("**Attributes**: _(none)_") + lines.append("") + if spec.typical_traversals: + lines.append("**Typical traversals**:") + lines.append("") + for role, traversal in spec.typical_traversals.items(): + lines.append(f"- `{role}`: {traversal}") + lines.append("") + return lines + + +def generate_markdown() -> str: + parts = [_BANNER, "## Summary", "", "| Edge | From | To | Cardinality | Brownfield-resolver-sourced | Member-only |", "| --- | --- | --- | --- | --- | --- |"] + for spec in EDGE_SCHEMA.values(): + parts.append( + f"| {spec.name} | {spec.src} | {spec.dst} | {spec.cardinality} | " + f"{_yes_no(spec.brownfield_resolver_sourced)} | {_yes_no(spec.member_only)} |" + ) + parts.append("") + for spec in EDGE_SCHEMA.values(): + parts.extend(_render_edge(spec)) + return "\n".join(parts).rstrip() + "\n" + + +def main() -> int: + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument( + "--out", + type=Path, + default=_DEFAULT_OUT, + help=f"Output path (default: {_DEFAULT_OUT.relative_to(_REPO_ROOT)})", + ) + parser.add_argument( + "--check", + action="store_true", + help="Exit 1 if committed doc differs from generator output", + ) + args = parser.parse_args() + content = generate_markdown() + if args.check: + if not args.out.is_file(): + print(f"missing {args.out}", file=sys.stderr) + return 1 + committed = args.out.read_text(encoding="utf-8") + if committed != content: + print(f"stale: {args.out} (run scripts/generate_edge_navigation.py)", file=sys.stderr) + return 1 + return 0 + args.out.parent.mkdir(parents=True, exist_ok=True) + args.out.write_text(content, encoding="utf-8") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/tests/test_edge_navigation_doc.py b/tests/test_edge_navigation_doc.py new file mode 100644 index 0000000..0e54e9a --- /dev/null +++ b/tests/test_edge_navigation_doc.py @@ -0,0 +1,34 @@ +"""Generated EDGE-NAVIGATION.md stability (SCHEMA-V2 PR-A).""" +from __future__ import annotations + +import subprocess +import sys +from pathlib import Path + +_REPO_ROOT = Path(__file__).resolve().parent.parent +_GENERATOR = _REPO_ROOT / "scripts" / "generate_edge_navigation.py" +_COMMITTED = _REPO_ROOT / "docs" / "EDGE-NAVIGATION.md" + + +def test_edge_navigation_doc_matches_generator_output() -> None: + result = subprocess.run( + [sys.executable, str(_GENERATOR), "--check"], + cwd=_REPO_ROOT, + capture_output=True, + text=True, + ) + assert result.returncode == 0, result.stderr or result.stdout + assert _COMMITTED.is_file() + + +def test_edge_navigation_doc_check_mode_detects_drift(tmp_path: Path) -> None: + stale = tmp_path / "EDGE-NAVIGATION.md" + stale.write_text("# stale\n", encoding="utf-8") + result = subprocess.run( + [sys.executable, str(_GENERATOR), "--check", "--out", str(stale)], + cwd=_REPO_ROOT, + capture_output=True, + text=True, + ) + assert result.returncode == 1 + assert "stale" in (result.stderr or result.stdout).lower() diff --git a/tests/test_kuzu_queries.py b/tests/test_kuzu_queries.py index f0d59e6..ff5d974 100644 --- a/tests/test_kuzu_queries.py +++ b/tests/test_kuzu_queries.py @@ -367,9 +367,8 @@ def test_trace_flow_empty_seeds_returns_empty(kuzu_graph) -> None: assert kuzu_graph.trace_flow([], depth=1) == [] -def test_kuzu_graph_get_raises_when_graph_ontology_too_old(tmp_path: Path) -> None: - """N4 / proposal §5.3: stale graphs must fail loudly on open.""" - db_path = tmp_path / "stale_ontology.kuzu" +def _open_stale_ontology_graph(tmp_path: Path, ontology_version: int) -> Path: + db_path = tmp_path / f"stale_ontology_{ontology_version}.kuzu" conn = kuzu.Connection(kuzu.Database(str(db_path))) conn.execute( "CREATE NODE TABLE GraphMeta(" @@ -377,12 +376,39 @@ def test_kuzu_graph_get_raises_when_graph_ontology_too_old(tmp_path: Path) -> No "ontology_version INT64, built_at INT64, source_root STRING, " "counts_json STRING, parse_errors INT64)" ) - stale = max(0, ONTOLOGY_VERSION - 1) conn.execute( "CREATE (:GraphMeta {key: $k, ontology_version: $ov, built_at: 0, " "source_root: '', counts_json: '{}', parse_errors: 0})", - {"k": "graph", "ov": stale}, + {"k": "graph", "ov": ontology_version}, ) + return db_path + + +def test_kuzu_graph_refuses_ontology_version_below_required(tmp_path: Path) -> None: + """v13 graphs refuse to open when ONTOLOGY_VERSION is 14 (SCHEMA-V2 PR-A). + + Overlaps ``test_kuzu_graph_get_raises_when_graph_ontology_too_old`` when + ``ONTOLOGY_VERSION - 1 == 13``; kept as an explicit v13 regression anchor. + """ + assert ONTOLOGY_VERSION >= 14 + db_path = _open_stale_ontology_graph(tmp_path, 13) + + prev_inst = KuzuGraph._instance + prev_path = KuzuGraph._instance_path + try: + KuzuGraph._instance = None + KuzuGraph._instance_path = None + with pytest.raises(RuntimeError, match="(?i)ontology.*14|required version 14"): + KuzuGraph.get(str(db_path)) + finally: + KuzuGraph._instance = prev_inst + KuzuGraph._instance_path = prev_path + + +def test_kuzu_graph_get_raises_when_graph_ontology_too_old(tmp_path: Path) -> None: + """N4 / proposal §5.3: stale graphs must fail loudly on open.""" + stale = max(0, ONTOLOGY_VERSION - 1) + db_path = _open_stale_ontology_graph(tmp_path, stale) prev_inst = KuzuGraph._instance prev_path = KuzuGraph._instance_path diff --git a/tests/test_schema_consistency.py b/tests/test_schema_consistency.py new file mode 100644 index 0000000..2f161cc --- /dev/null +++ b/tests/test_schema_consistency.py @@ -0,0 +1,94 @@ +"""DDL ↔ EDGE_SCHEMA consistency (SCHEMA-V2 PR-A). + +Endpoint (src/dst) parity only in PR-A; ``EDGE_SCHEMA.attrs`` vs DDL column lists +is a follow-up (column parity test or codegen). +""" +from __future__ import annotations + +import re +from pathlib import Path + +from java_ontology import BROWNFIELD_RESOLVER_STRATEGY_SET, EDGE_SCHEMA + +_REPO_ROOT = Path(__file__).resolve().parent.parent +_BUILD_AST_GRAPH = _REPO_ROOT / "build_ast_graph.py" + +_REL_DDL_RE = re.compile( + r'CREATE REL TABLE (\w+)\(FROM (\w+) TO (\w+)', +) +_STRATEGY_LITERAL_RE = re.compile( + r"""(?:strategy|resolution_strategy|edge_strat)\s*=\s*["']([a-z_]+)["']""", +) +_EMITTER_FILES = ( + "build_ast_graph.py", + "graph_enrich.py", + "ast_java.py", +) + + +def _ddl_endpoints() -> dict[str, tuple[str, str]]: + text = _BUILD_AST_GRAPH.read_text(encoding="utf-8") + out: dict[str, tuple[str, str]] = {} + for match in _REL_DDL_RE.finditer(text): + name, src, dst = match.group(1), match.group(2), match.group(3) + out[name] = (src, dst) + return out + + +def _strategy_literals_in_emitters() -> set[str]: + found: set[str] = set() + for rel in _EMITTER_FILES: + text = (_REPO_ROOT / rel).read_text(encoding="utf-8") + found.update(_STRATEGY_LITERAL_RE.findall(text)) + return found + + +def test_schema_consistency_all_ddl_endpoints_match_edge_schema() -> None: + ddl = _ddl_endpoints() + schema_names = set(EDGE_SCHEMA) + ddl_names = set(ddl) + assert schema_names == ddl_names, ( + f"EDGE_SCHEMA keys {sorted(schema_names)} != DDL edges {sorted(ddl_names)}" + ) + for name, spec in EDGE_SCHEMA.items(): + src, dst = ddl[name] + assert spec.src == src, f"{name}: schema src {spec.src!r} != DDL {src!r}" + assert spec.dst == dst, f"{name}: schema dst {spec.dst!r} != DDL {dst!r}" + + +def test_schema_consistency_http_calls_pre_flip_symbol_to_route() -> None: + spec = EDGE_SCHEMA["HTTP_CALLS"] + assert spec.src == "Symbol" + assert spec.dst == "Route" + + +def test_schema_consistency_async_calls_pre_flip_symbol_to_route() -> None: + spec = EDGE_SCHEMA["ASYNC_CALLS"] + assert spec.src == "Symbol" + assert spec.dst == "Route" + + +def test_edge_schema_member_only_flags_on_method_level_edges() -> None: + assert EDGE_SCHEMA["DECLARES_CLIENT"].member_only is True + assert EDGE_SCHEMA["EXPOSES"].member_only is True + assert EDGE_SCHEMA["OVERRIDES"].member_only is True + assert EDGE_SCHEMA["CALLS"].member_only is True + assert "DECLARES_PRODUCER" not in EDGE_SCHEMA + assert EDGE_SCHEMA["HTTP_CALLS"].member_only is False + assert EDGE_SCHEMA["ASYNC_CALLS"].member_only is False + + +def test_http_async_typical_traversals_include_pre_flip_current_keys() -> None: + for edge in ("HTTP_CALLS", "ASYNC_CALLS"): + trav = EDGE_SCHEMA[edge].typical_traversals + assert "member_subject_current" in trav + assert "HTTP_CALLS" in trav["member_subject_current"] or "ASYNC_CALLS" in trav["member_subject_current"] + assert "member_subject" in trav + assert "DECLARES" in trav["member_subject"] or "DECLARES_PRODUCER" in trav["member_subject"] + + +def test_brownfield_resolver_strategy_literals_emitted_in_builder_subset() -> None: + literals = _strategy_literals_in_emitters() + assert literals, "expected strategy literals from emitter modules" + unknown = literals - BROWNFIELD_RESOLVER_STRATEGY_SET + assert not unknown, f"strategy literals not in BROWNFIELD_RESOLVER_STRATEGY_SET: {sorted(unknown)}"