Skip to content

propose: schema-v2 — edges connect the nodes the edge is about (HTTP_CALLS, ASYNC_CALLS, Producer node, Edge Navigation Schema)#151

Merged
HumanBean17 merged 4 commits into
masterfrom
propose/schema-v2
May 16, 2026
Merged

propose: schema-v2 — edges connect the nodes the edge is about (HTTP_CALLS, ASYNC_CALLS, Producer node, Edge Navigation Schema)#151
HumanBean17 merged 4 commits into
masterfrom
propose/schema-v2

Conversation

@HumanBean17
Copy link
Copy Markdown
Owner

Draft propose. Locks the schema fix for the May 16 cross-service trace pain and applies the same fix to async — the half-modeling bug exists in both channels (HTTP has a Client node the edge bypasses; async has no producer node at all). Both ship in v2 so the canonical Edge Navigation Schema isn't published with a known asymmetry.

What this propose decides

  • Principle: edges connect the nodes whose data the edge is about. HTTP and async are two symptoms of one bug.
  • HTTP_CALLS moves from Symbol → Route to Client → Route.
  • ASYNC_CALLS moves from Symbol → Route to Producer → Route. New Producer node mirrors Client.
  • New DECLARES_PRODUCER(Symbol → Producer) parallels DECLARES_CLIENT.
  • EDGE_SCHEMA: dict[str, EdgeSpec] lands in java_ontology.py (same ontology home pattern as FUZZY_STRATEGY_SET).
  • docs/EDGE-NAVIGATION.md is generated from EDGE_SCHEMA. CI fails on hand-edit or DDL/ontology disagreement.
  • Hints v3 is teed up to consume EDGE_SCHEMA in PR-D; the v3 design itself lives in a separate HINTS-V3-PROPOSE.md.

Migration — 4 PRs

  • PR-A: ship EDGE_SCHEMA + doc generator + CI invariants. No DDL flips yet — schema-as-source-of-truth lands first.
  • PR-B: flip HTTP_CALLS endpoints in DDL + pass6 + all callers + tests.
  • PR-C: add Producer node + DECLARES_PRODUCER + flip ASYNC_CALLS endpoints. Includes producer-side parallels of brownfield tests.
  • PR-D: hints v3, gated on PR-A. Replaces the generic empty-result hint with kind/direction-aware templates driven by EDGE_SCHEMA.

Out of scope (explicitly)

  • AsyncConsumer node — no @CodebaseConsumer annotation exists; open follow-up if one is added.
  • Soft-migration aliases (no active users per repo rules).
  • Convenience composite-edge views / NODE_SCHEMA / multi-target Client/Producer modeling — all separate proposes.

Cardinals

  • 4 PRs (TL;DR · §6 · Decision §7.20 — all match).
  • 23 use cases (UC1–UC11 HTTP, UC12–UC18 async, UC19–UC23 schema infrastructure).
  • 20 locked decisions (§7).
  • 11 edges total (10 existing + DECLARES_PRODUCER).

Revision history

First draft was HTTP-only and scoped out async on the grounds that "no AsyncClient analog node exists." That was wrong — @CodebaseProducer already exists in the annotation set (ast_java.py:180) and produces metadata that is currently lost on the edge. Same half-modeling bug as HTTP, one stage earlier. Async fix folded in; framing lifted to the principle. Full traceability in Appendix B.

Framing questions for reviewers

  1. UC8 / UC16 are now 4-hop instead of 3 for cross-service trace assembly. Acceptable trade for first-class caller-side traversal, or worth a composite-traversal hint / materialized view in v2?
  2. cardinality is informational only (kuzu doesn't enforce). Worth keeping, or trim to keep EdgeSpec minimal?
  3. Producer field shape mirrors Client 1:1. Should it be reviewed in PR-C against actual AsyncProducerHint data, or are the obvious analogs (target_topictarget_service, topicpath, brokermethod) good enough to lock here?
  4. PR-D (hints v3) is teed up here but the design lives in a separate propose. Bundle into v2 or stay split?

cc @HumanBean17

…ion 21)

The callee side is already symmetric for HTTP and async: @KafkaListener
methods are the Route's method_fqn, reachable via EXPOSES(Symbol→Route),
exactly mirroring @GetMapping. Producer node exists only because
kafkaTemplate.send(...) is a call expression with no first-class method
identity — listener methods don't have that problem.

- §5 out-of-scope row rewritten: 'no annotation yet' → 'no navigation gap'
- Decision 21 added explicitly locking the asymmetry
- §8 risk row updated to reference Decision 21
- Appendix B records second-round grilling outcome

No change to PR count (4), UC count (23), or edge count (11).
@HumanBean17
Copy link
Copy Markdown
Owner Author

Update — Decision 21: no Consumer node

Grilled during the propose-review pass: should the callee-side mirror the caller-side Client/Producer split with a Consumer node?

Conclusion: no. The callee side is already symmetric. @KafkaListener annotates a method, and that method is the Route's method_fqn — same shape as @GetMapping exposing an http_endpoint Route via EXPOSES(Symbol → Route) (see ast_java.py:2169). The Producer node exists only because kafkaTemplate.send(...) is a call expression inside an arbitrary method body and has no first-class identity to hang topic/target metadata on. Listener methods don't have that problem.

Changes (one commit, force-push not needed — fast-forward):

  • §5 out-of-scope row rewritten from "no @CodebaseConsumer annotation" to "no navigation gap to fill, callee-side already symmetric"
  • Decision 21 added explicitly locking the asymmetry as deliberate
  • §8 risk row references Decision 21
  • Appendix B records the second-round grilling outcome

No change to PR count (4), UC count (23), or edge count (11). Ready for review-1 when you have cycles.

@HumanBean17 HumanBean17 marked this pull request as ready for review May 16, 2026 13:37
Copy link
Copy Markdown
Owner Author

@HumanBean17 HumanBean17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict

Approve the direction; hold implementation until a few contract gaps are closed.

The core diagnosis matches the codebase: HTTP_CALLS / ASYNC_CALLS are Symbol → Route while Client already exists and is only reachable via DECLARES_CLIENT — exactly the “empty neighbors from the node you’re holding” failure mode. Folding async in (not HTTP-only) and Decision 21 (no Consumer node) are well argued.


What works well

  • Principle-first framing (“edges connect the nodes whose data the edge is about”).
  • PR-A before DDL flips (EDGE_SCHEMA + generated doc against current schema).
  • No dual-edge / alias policy (matches repo rules).
  • UC tables are concrete and testable (UC1, UC5–7, UC14–15, UC18; UC6/15 “declared but unresolved” on caller-side nodes).
  • Appendix B traceability for HTTP-only → HTTP+async revision.

Must fix before implementation

1. Re-index / ontology_version callout missing

Per repo rules, graph-shape changes need an explicit subsection: bump ontology_version (currently 13), README “Re-index required”, full rebuild. Don’t leave this implied by “4 PRs”.

2. brownfield_sourced vs FUZZY_STRATEGY_SET (internal contradiction)

Decision §7.9 ties brownfield_sourced to FUZZY_STRATEGY_SET, but hints-v2 treats codebase_client, layer_b_ann, etc. as non-fuzzy primary paths — and HTTP/async edges routinely use those strategies. Appendix A marks HTTP_CALLS / ASYNC_CALLS as brownfield_sourced=True. Lock semantics before PR-A (rename flag, widen set, or split fuzzy_strategy_capable vs annotation-sourced).

3. find_route_callers / trace_request_flow / impact-analysis not in scope tables

kuzu_queries.py still has Symbol -[:HTTP_CALLS|ASYNC_CALLS]-> Route in find_route_callers, trace_request_flow, and impact-analysis expansion (~L1335). PR-B/C list pr_analysis, mcp_v2, server.py but not an explicit decision for:

  • Does find_route_callers return Client/Producer ids, or join via DECLARES_* and keep returning declaring Symbol?
  • What happens to CallerInfo.caller_symbol_id?

Without this, PR-B risks a half-migrated query layer.

4. Type-level describe rollups unspecified

Today only DECLARES.DECLARES_CLIENT and DECLARES.EXPOSES are composed for type symbols. After the flip, methods lose direct HTTP_CALLS out-edges; types never had them. UC10 fixes describe(client) but class-level describe loses cross-service signal unless you add e.g. DECLARES.DECLARES_PRODUCER (and optionally composed HTTP/async). Specify in PR-B/C or accept the regression explicitly.

5. HINTS-V3-PROPOSE.md referenced but missing

PR-D / UC2, UC3, UC17, UC22 depend on it. Add a stub in this PR, fold a minimal v3 section here, or mark PR-D blocked until v3 propose exists.

6. plans/PLAN-SCHEMA-V2.md not in this PR

Fine for propose-only; block merge on plan + CURSOR-PROMPTS-* before PR-A (grep-enumeration contract in §8).

7. Producer DDL vs AsyncProducerHint

Proposed target_topic + topic vs hint’s topic only; Client has path / path_template / path_regex. Lock field names against AsyncProducerHint + pass6 join keys in §3.2 now; validate templates in PR-C.

8. §3.4 pass label

Clients/producers are materialized in pass5_imperative_edges, not pass4 (pass4 = routes/EXPOSES).


Medium priority

Gap Note
find(kind="producer") / resolve(hint_kind="producer") VALID_PRODUCER_KINDS exists; without MCP parity agents only reach producers via known method + DECLARES_PRODUCER.
Docs sweep README, docs/AGENT-GUIDE.md, exploration skill still teach Symbol → Route. Add PR row in B+C.
GraphMeta Mention producers_total / declares_producer_total alongside clients_total.
UC9 pr_analysis uses HTTP_CALLS|ASYNC_CALLS — mention both in the UC row.
§7 count Appendix B says “16 → 20” decisions; §7 has 21 items.

Framing questions (my answers)

  1. UC8/UC16 extra hop (3→4) — Acceptable for v2 if PR-D ships composite-traversal hints; materialized convenience edges are premature.
  2. cardinality — Keep (informational, cheap, useful for docs/future invariants).
  3. Producer 1:1 with Client — Don’t lock field parity; lock names against AsyncProducerHint now, validate templates in PR-C.
  4. Bundle HINTS-V3? — Split is fine (like hints-v2), but don’t schedule PR-D until HINTS-V3-PROPOSE.md exists — v2 graph without v3 hints is a footgun (UC3).

Pre-merge checklist for this propose

  • Schema / re-index subsection
  • brownfield_sourced semantics vs hints v2
  • Downstream API table (find_route_callers, trace_request_flow, impact analysis, CallerInfo)
  • Type-level describe rollups decision
  • Fix pass5 reference in §3.4
  • HINTS-V3 stub or PR-D blocked
  • Producer field names aligned with AsyncProducerHint

Happy to iterate on the propose text if you want these folded in before PLAN-SCHEMA-V2.

Major items (must-fix):
- §3.6: re-index requirement + ONTOLOGY_VERSION 13→14 bump (PR-A) +
  GraphMeta counters (PR-C) + README/AGENT-GUIDE references
- §3.7: downstream API contract decisions for find_route_callers,
  trace_request_flow, impact-analysis expansion. External API stable,
  internal Cypher grows one hop. (Decisions 22, 23, 24.)
- §3.8: type-level describe rollups — DECLARES.DECLARES_PRODUCER added;
  composed DECLARES.HTTP_CALLS / DECLARES.ASYNC_CALLS deliberately not
  added. (Decision 25.)
- §3.9: MCP find(kind=producer) / resolve(hint_kind=producer) parity.
  (Decision 26.)
- §3.10: docs sweep per-PR with grep enumeration. (Decision 27.)
- §3.11: brownfield_sourced → brownfield_resolver_sourced with new
  BROWNFIELD_RESOLVER_STRATEGY_SET (union of FUZZY_STRATEGY_SET and
  annotation/primary-path strategies). Closes hints-v2 contradiction.
  Decision 9 updated, Appendix A flags renamed. (Decision 28.)
- §3.4 corrected: pass4 → pass5_imperative_edges.
- §3.2 Producer fields re-grounded in AsyncProducerHint + AsyncCallRow.
  target_topic + HTTP-specific fields removed; direction added.
- PLAN/CURSOR-PROMPTS as merge gate for PR-A. (Decision 29.)
- PR-D explicitly gated on HINTS-V3-PROPOSE.md existing. (Decision 30.)

Medium:
- UC9 row mentions both HTTP_CALLS and ASYNC_CALLS in pr_analysis query.
- TL;DR migration bullet expanded with the new PR scope.
- Appendix B Review-1 application section added.

Cardinals (still consistent):
- 4 PRs · 23 UCs · 11 edges · 32 decisions (was 21).
@HumanBean17
Copy link
Copy Markdown
Owner Author

Review-1 applied — commit 95bb67a. Per-item response below.

Must-fix items

1. Re-index / ontology_version callout

Applied. New §3.6 (Re-index requirement) spells out the version bump, README + AGENT-GUIDE updates, full-rebuild requirement, and GraphMeta counter additions. Decision 31 locks the 13 → 14 bump in PR-A; Decision 32 locks the GraphMeta counters in PR-C.

2. brownfield_sourced vs FUZZY_STRATEGY_SET

Applied (with rename). You're right — the original Decision 9 was wrong. The flag's intent ("brownfield-resolver-emitted, carries a strategy attribute") doesn't match what FUZZY_STRATEGY_SET actually contains. Renamed brownfield_sourcedbrownfield_resolver_sourced (§3.11, Decision 28), anchored to a new BROWNFIELD_RESOLVER_STRATEGY_SET constant in java_ontology.py that is the union of FUZZY_STRATEGY_SET + annotation/primary-path strategies (codebase_client, layer_b_ann, client_target, client_target_path). Hints v3 reads both sets — fuzzy fires the fuzzy hint, brownfield-resolver fires the "absence may mean unresolved" hint. Decision 9 updated, Appendix A flag references renamed.

3. find_route_callers / trace_request_flow / impact-analysis

Applied. New §3.7 enumerates all three call sites at kuzu_queries.py:1335, :1463, :1508 with explicit contract decisions:

  • find_route_callers returns declaring Symbol id (not Client/Producer). CallerInfo.caller_symbol_id stays. Internal query becomes two-hop. (Decision 22.)
  • trace_request_flow external shape preserved; one extra internal hop. (Decision 23.)
  • Impact-analysis expansion goes three-hop. (Decision 24.)

Why I kept this at the contract level, not plan-level: deciding "external API stays stable, internal Cypher grows a hop" is a design call that affects every downstream consumer. The exact rewritten Cypher strings belong in PLAN-SCHEMA-V2.md.

4. Type-level describe rollups

Applied. New §3.8. PR-C adds DECLARES.DECLARES_PRODUCER + OVERRIDDEN_BY.DECLARES_PRODUCER to the rollup set (parallel to existing DECLARES.DECLARES_CLIENT). Pushback on composed HTTP/async rollups: I deliberately did not add DECLARES.HTTP_CALLS / DECLARES.ASYNC_CALLS because those would be three-hop composed rollups (DECLARES → DECLARES_CLIENT → HTTP_CALLS), which is expensive and the existing producer/client rollups already give agents the "this type has cross-service surface; navigate down" signal. Decision 25 captures this explicitly. If you disagree, flag — that's the right place to push back.

5. HINTS-V3-PROPOSE.md referenced but missing

Applied as a hard gate. Decision 30: PR-D is blocked until HINTS-V3-PROPOSE.md exists as a draft PR in the same review cycle. The "v2 graph without v3 hints is a UC3 footgun" framing is right — I chose not to inline a v3 stub here because it would muddy the propose, but I'll draft HINTS-V3-PROPOSE.md as the next deliverable (separate PR) before PR-A merges.

6. plans/PLAN-SCHEMA-V2.md not in this PR

Applied as a merge gate. Decision 29 locks PLAN-SCHEMA-V2 + CURSOR-PROMPTS-SCHEMA-V2 as merge prerequisites for PR-A. Both will be separate PRs.

7. Producer DDL vs AsyncProducerHint

Applied — substantial rewrite. You caught a real problem: the original §3.2 had target_topic (doesn't exist on AsyncProducerHint), path / path_template / path_regex / method (HTTP-specific, no async analog), and target_service (Client-specific). Rewrote §3.2 with a field-by-field grounding table sourced from AsyncProducerHint (graph_enrich.py:220 — only client_kind, topic, broker) + AsyncCallRow dispatch-site metadata (build_ast_graph.py:257). Added direction (preserved at node level so producer-side filters don't have to walk to the edge). Producer is not a 1:1 copy of Client — locked that explicitly.

8. §3.4 pass label

Fixed. pass4 → pass5_imperative_edges (build_ast_graph.py:1545), with the exact materialization line cited (tables.client_rows.append at build_ast_graph.py:1632).

Medium-priority items

Item Status
find(kind="producer") / resolve(hint_kind="producer") Applied — §3.9, Decision 26. PR-C scope.
Docs sweep (README, AGENT-GUIDE, exploration skill) Applied — §3.10, Decision 27. PR-B handles HTTP, PR-C handles async, each via grep enumeration.
GraphMeta producers_total / declares_producer_total Applied — §3.6 item 4, Decision 32. PR-C scope.
UC9 mentions both edges Applied — UC9 row now reads HTTP_CALLS|ASYNC_CALLS on both pre-v2 and post-v2 sides.
§7 count drift (16 → 20 → now 32) Appendix B updated; Decisions count line added at end of §7.

Framing answers — agree

  1. UC8/UC16 extra hop: agreed, acceptable for v2 if PR-D ships composite hints; materialized views remain out of scope (§5 row unchanged).
  2. cardinality: kept (informational, cheap, useful for future invariants and the generated doc).
  3. Producer 1:1 with Client: agreed — §3.2 now explicitly states Producer is not field-for-field with Client; grounded in real data sources.
  4. Bundle HINTS-V3?: agreed split, agreed not to schedule PR-D until v3 propose exists (Decision 30 locks this as a hard gate).

Cardinals after this revision

  • 4 PRs · 23 UCs · 11 edges · 32 decisions (was 21). All match across TL;DR, §6, §7.
  • Decisions 22–32 added.
  • 6 new §3 subsections (§3.6–§3.11), §3.12 is the renumbered hints-v3 preview.

Pre-merge checklist (re-checked)

  • Schema / re-index subsection — §3.6
  • brownfield_sourced semantics — §3.11, Decision 28, renamed
  • Downstream API table — §3.7
  • Type-level describe rollups decision — §3.8, Decision 25
  • Fix pass5 reference in §3.4
  • HINTS-V3 stub or PR-D blocked — PR-D blocked (Decision 30); v3 propose to be drafted next
  • Producer field names aligned with AsyncProducerHint — §3.2 rewritten

Ready for review-2 when you have cycles. If you want me to draft HINTS-V3-PROPOSE.md before the next review pass, say the word.

Breaking changes allowed (no active users) reconfirmed. Review-1's
'external API stable across v2' framing on \xc2\xa73.7 / Decisions 22\xe2\x80\x9324 was
self-imposed compatibility ceremony \xe2\x80\x94 not principled, and actively
working against the v2 principle (edges connect the nodes whose data
the edge is about).

Applied:
- \xc2\xa73.7: APIs return the caller-side node (Client or Producer), not the
  declaring Symbol. CallerInfo is renamed and reshaped to RouteCaller
  with caller_node_id, caller_node_kind, declaring_symbol_id (back-ref),
  plus caller-side metadata pulled from the node.
- Decision 22 rewritten: RouteCaller replaces CallerInfo; old shape
  removed; no back-compat alias.
- Decision 23 rewritten: trace_request_flow output surfaces the
  Client/Producer hop directly. UC5 visibility is now first-class
  in API output, not hidden behind a join.
- Decision 24 rewritten: impact-analysis output gains caller-side
  node id alongside each route.
- PR-B test summary updated to reference the reshaped RouteCaller.
- Appendix B note added.

Decisions count unchanged (32). PR/UC/edge counts unchanged.
@HumanBean17
Copy link
Copy Markdown
Owner Author

Follow-up — breaking-changes-allowed correction (commit 50c6cc5)

Reconfirmed: no active users, no real users in the wild. Review-1's §3.7 framing ("external API stable across v2") was self-imposed compatibility ceremony I added under a soft reading of "breaking changes allowed." That framing actively worked against the v2 principle ("edges connect the nodes whose data the edge is about"), because the data answering "who calls this route?" lives on the Client/Producer, not on the Symbol.

Reverted to the principled shape:

  • find_route_callers returns RouteCaller (new), not CallerInfo (removed). Fields: caller_node_id, caller_node_kind (client|producer), caller_microservice, declaring_symbol_id (back-reference), confidence, match, plus caller-side metadata from the node (target_service for Client, topic+broker for Producer).
  • trace_request_flow output surfaces the Client/Producer hop in each caller record — caller_node_id + caller_node_kind alongside declaring_symbol_id + declaring_symbol_fqn. UC5 visibility (which of a method's multiple clients made a given call) is now first-class in the output, not hidden behind a join.
  • Impact-analysis output gains the caller-side node id alongside each impacted route.
  • Decisions 22, 23, 24 rewritten accordingly. No back-compat aliases.

No change to PR count (4), UC count (23), edge count (11), decision count (32). The change is one principle correction propagated through three locked decisions and the test summary.

@HumanBean17 HumanBean17 merged commit 1e1167b into master May 16, 2026
1 check passed
@HumanBean17 HumanBean17 deleted the propose/schema-v2 branch May 23, 2026 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant