From f232aeb42c51ba5ceb6e0db9c83e9e59eaf2a21d Mon Sep 17 00:00:00 2001 From: dmitry Date: Thu, 21 May 2026 16:57:25 +0300 Subject: [PATCH 1/2] add propose for hints JSON ids shape (#195) Document combo 1+2+6: JSON-shaped hint emissions, agent guide and server ids docs, and alignment with 40-char hex symbol ids. Co-authored-by: Cursor --- propose/HINTS-MCP-JSON-IDS-PROPOSE.md | 186 ++++++++++++++++++++++++++ 1 file changed, 186 insertions(+) create mode 100644 propose/HINTS-MCP-JSON-IDS-PROPOSE.md diff --git a/propose/HINTS-MCP-JSON-IDS-PROPOSE.md b/propose/HINTS-MCP-JSON-IDS-PROPOSE.md new file mode 100644 index 0000000..3e10cb8 --- /dev/null +++ b/propose/HINTS-MCP-JSON-IDS-PROPOSE.md @@ -0,0 +1,186 @@ +# HINTS-MCP-JSON-IDS — copy-safe hint emissions and id-shape docs + +## Status + +Proposal — not yet implemented. + +**Tracks:** [#195](https://github.com/HumanBean17/java-codebase-rag/issues/195) (battle-test: agents copy Python-style `neighbors([''],…)` from hints → `Unknown id prefix for \`['']\``). + +**Chosen fix combo (issue table):** **1** (hint templates) + **2** (agent guide + tool descriptions) + **6** (align docs with live graph id shape). **Explicitly not** runtime coercion (issue options 3–5, 9) or structured hints (7) in this effort. + +**Amends (when implemented):** locked hint catalogs in `propose/completed/HINTS-ROAD-SIGNS-PROPOSE.md` Appendix A and downstream v2/v3/v4 appendices — emission strings only, not trigger logic. **Blocks or lands with:** in-flight [`DESCRIBE-HINTS-STRUCTURAL-PROPOSE.md`](./DESCRIBE-HINTS-STRUCTURAL-PROPOSE.md) tier-1/2 templates (still drafted with `neighbors(['{id}'],…)`); implementation PR must not merge structural hints on the old shape. + +## Problem Statement + +Road-sign hints in `mcp_hints.py` (and `EDGE_SCHEMA.typical_traversals` in `java_ontology.py`, which feeds empty-`neighbors` teaching strings) use **pseudo-Python** call syntax: + +```text +routes via members: neighbors(['{id}'],'out',['DECLARES.EXPOSES']) +``` + +Agents treat these as literal MCP argument values. A common failure mode: + +1. `resolve` / `describe` returns a **40-character hex** symbol id (no prefix). +2. `describe` hint embeds `neighbors([''],…)`. +3. Agent calls `neighbors` with `"ids": "['']"` (single-quoted “list” as a string). +4. FastMCP `pre_parse_json` does **not** parse that string (invalid JSON) → one bogus origin id → `_resolve_node_kind` → `success=false`, `Unknown id prefix for \`['']\``. + +The same flow succeeds when the agent sends valid JSON: `"ids": ""`, `"ids": [""]`, or `"ids": "[\"\"]"` (FastMCP pre-parse). + +This is **hint-format / documentation drift**, not broken graph data or wrong ids from upstream tools. + +Secondary failure mode (**#6**): README and `docs/AGENT-GUIDE.md` still show `sym:…FQN…` as the canonical `describe` / `neighbors` example id. **Stored** symbol ids are SHA1 hex from `graph_enrich.symbol_id` (no `sym:` prefix). `sym:` is recognized in `_node_kind_from_id` for kind detection only; passing `sym:` as `describe(id=…)` does not hit the graph unless that exact string is stored (it is not). Agents that learn id shape from docs copy the wrong form. + +## Proposed Solution + +### 1 — Hint emission contract (baseline) + +Replace pseudo-Python `neighbors(['{id}'], 'out', ['EDGE'])` across the **entire** hint surface with **JSON-shaped** next-step fragments agents can paste into MCP tool calls. + +**Canonical single-origin shape (locked):** + +```text +