Skip to content

feat(schema, mcp, hints): phantom-receiver/chained sites move to UnresolvedCallSite; include_unresolved; CALLS dedup#185

Merged
HumanBean17 merged 3 commits into
masterfrom
feat/calls-noise-unresolved-facet
May 20, 2026
Merged

feat(schema, mcp, hints): phantom-receiver/chained sites move to UnresolvedCallSite; include_unresolved; CALLS dedup#185
HumanBean17 merged 3 commits into
masterfrom
feat/calls-noise-unresolved-facet

Conversation

@HumanBean17
Copy link
Copy Markdown
Owner

Summary

  • Receiver-failure call sites (chained_receiver, unresolved-receiver phantom) move from phantom-dst CALLS rows to UnresolvedCallSite + UNRESOLVED_AT. Known-receiver-external JDK rows stay on CALLS with resolved=false.
  • neighbors gains include_unresolved (source-ordered interleave with row_kind), dedup_calls (collapse identical callees), and mutual exclusivity with edge_filter.
  • New hints (TPL_NEIGHBORS_CALLS_HIGH_FANOUT, TPL_NEIGHBORS_CALLS_HAS_UNRESOLVED), describe unresolved rollup (cap 5), and java-codebase-rag unresolved-calls list|stats CLI.
  • Ontology stays 15; re-index required for the new graph tables and CALLS shape change.

Implements PR-3 from plans/PLAN-CALLS-NOISE.md. Propose moved to propose/completed/CALLS-NOISE-AND-RESOLUTION-PROPOSE.md.

Test plan

  • .venv/bin/ruff check .
  • .venv/bin/python -m pytest tests -v (668 passed; use .venv/bin on PATH for CLI tests)
  • PR-3 named tests: pass3 UCS emission, no phantom/chained CALLS, interleave/dedup/mutex, hints H1–H8, find_callers
  • Post-build invariant: zero CALLS with strategy in ('phantom','chained_receiver') on fresh bank index
  • Manual: java-codebase-rag unresolved-calls stats returns reason buckets (205 sites on bank fixture)

Made with Cursor

…llSite facet

Receiver-failure chained/phantom sites emit UnresolvedCallSite + UNRESOLVED_AT
instead of phantom-dst CALLS. neighbors gains include_unresolved interleave,
dedup_calls, row_kind, and new hints; describe and unresolved-calls CLI wire up.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown
Owner Author

@HumanBean17 HumanBean17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical review (PR-3 / CALLS-NOISE)

Verdict: Direction is right and the core invariant holds, but I would not merge as-is without fixing the sym: ID footgun and adding the plan-mandated describe/CLI tests.


What works well

  • Graph split matches the propose: receiver-failure (chained_receiver, unresolved-receiver phantom) → UnresolvedCallSite + UNRESOLVED_AT; known-external JDK rows stay on CALLS with preserved receiver-tier strategy. Pass3 + smoke tests lock this in.
  • MCP behavior is coherent: include_unresolvededge_filter mutual exclusivity is fail-loud; interleave sort (line, byte, resolved-before-unresolved) matches the plan; dedup_calls only collapses resolved rows.
  • Ontology cleanup is careful: phantom / chained_receiver leave FUZZY_STRATEGY_SET but stay in BROWNFIELD_RESOLVER_STRATEGY_SET for HTTP/ASYNC; schema consistency correctly excludes UNRESOLVED_AT from EDGE_SCHEMA parity.

Blockers (fix before merge)

1. UnresolvedCallSite IDs look like Symbols — describe will mislead agents

UCS ids are {caller_id}:{line}:{byte} where caller_id is sym:…, so UCS ids start with sym:. In interleaved neighbors output they are exposed as NodeRef(id=ucs_id, kind="symbol", …).

_node_kind_from_id treats any sym: prefix as a Symbol. The completed propose says UCS is intentionally not a Symbol node. An agent that follows edge.other.id into describe gets a bogus symbol (usually “no node found”) — wrong affordance on a CALLS-shaped edge.

Suggested fix: dedicated id prefix (ucs: / unresolved:) + new NodeRef.kind, or keep other.id as the caller and put the site id only in attrs (you already have unresolved_call_site_id).

2. Plan-mandated tests are missing

Planned coverage Present?
describe rollup: cap 5 + footer + total Nounresolved_call_sites_* never asserted in tests
java-codebase-rag unresolved-calls list|stats No CLI test (manual evidence only)
VALID_UNRESOLVED_CALL_REASONS for CLI --reason No — constant defined but unused

These guard the operator/debug path (HV8/HV9) that replaces grepping CALLS for phantom/chained_receiver.


Important non-blockers

  • GraphMeta pass3 percentages: phantom_chained / phantom_other are UCS counts but still divided by stats.total (CALLS rows only) — verbose log line can mislead. Use a denominator that includes UCS or drop percentages.
  • README ontology section still says “Unresolved targets become phantom nodes” and the edge-types table omits UNRESOLVED_AT / UnresolvedCallSite — contradicts PR-3 after the re-index callout was updated.
  • High-fanout hint uses len(results); with default limit=25, a 57-CALLS method shows “25 CALLS on this method…”. Tests use limit=500. Consider passing true fan-out or documenting page-local counts.
  • docs/EDGE-NAVIGATION.md untouched — worth noting UCS is not in EDGE_SCHEMA.

Minor / follow-up

  • Duplicate test_find_callers_no_phantom_chained_strategy in test_kuzu_queries.py and test_mcp_v2.py.
  • plans/PLAN-CALLS-NOISE.md still “active” while propose is in completed/.
  • unresolved_count hint only when len(origins)==1.

Recommendation

Minimal merge bar: (1) id/kind fix + (2) one describe rollup test + one CLI smoke test. Everything else can be a fast follow-up.

…e CALLS count

UnresolvedCallSite ids use ucs: prefix with NodeRef.kind unresolved_call_site so
describe is not mis-routed. Add rollup and CLI smoke tests; wire --reason choices;
fix pass3 UCS percentage denominator and README graph section.

Co-authored-by: Cursor <cursoragent@cursor.com>
@HumanBean17
Copy link
Copy Markdown
Owner Author

Thanks for the thorough review — agreed on both blockers and most of the non-blockers. Pushed d0d6ae4.

Blockers (fixed)

1. UCS ids looked like Symbols — Agreed this was a real footgun. UnresolvedCallSite ids are now ucs:{caller_id}:{line}:{byte}; interleaved neighbors rows use other.kind=unresolved_call_site (not symbol). Agents should use attrs (unresolved_call_site_id, reason, …) or describe(caller_id) / CLI — not describe(ucs:…).

2. Missing tests — Added:

  • test_describe_unresolved_call_sites_rollup_cap_footer_and_total (cap ≤5, footer, total on pinned ClientMessageProcessor#process)
  • test_cli_unresolved_calls_list_and_stats (stats + list --reason chained_receiver, rejects invalid --reason)
  • VALID_UNRESOLVED_CALL_REASONS wired to CLI --reason choices

Non-blockers (fixed)

  • Pass3 verbose percentages: UCS counts now use denominator CALLS + UCS for chained/phantom-receiver lines; callee-unresolved still / CALLS only.
  • README §6: Ontology 15, UnresolvedCallSite / UNRESOLVED_AT, corrected phantom wording.
  • High-fanout hint: Uses calls_row_count (true outbound CALLS) — test now uses limit=25 to prove page size ≠ fan-out.
  • docs/AGENT-GUIDE.md: Notes ucs: prefix, non-describable endpoint, UNRESOLVED_AT not in EDGE_SCHEMA.

Also: removed duplicate test_find_callers_no_phantom_chained_strategy from test_mcp_v2.py (kept in test_kuzu_queries.py); _run_cli prefers .venv/bin/java-codebase-rag so subprocess CLI tests hit the editable install.

Deferred (agree, follow-up)

  • docs/EDGE-NAVIGATION.md: Generated from EDGE_SCHEMA — left unchanged; AGENT-GUIDE carries the UCS note until we add a generator hook or manual appendix.
  • plans/PLAN-CALLS-NOISE.md still active: Will move to plans/completed/ when the full CALLS-NOISE effort lands (PR-4+ if any).
  • unresolved_count hint only for single origin: Intentional for now — multi-origin CALLS concat is already called out as awkward in AGENT-GUIDE.

667 passed on full pytest tests locally (3 unrelated CLI quiet/stdout tests fail when a broken system java-codebase-rag shadows .venv on PATH — pre-existing on this machine; PR-3 suite green).

Copy link
Copy Markdown
Owner Author

@HumanBean17 HumanBean17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical review (PR-3 / CALLS-NOISE)

Verdict: approve with minor follow-ups. Scope matches plans/PLAN-CALLS-NOISE.md PR-3, CI is green, and the breaking change is documented with solid test coverage. The second commit (ucs: ids, fan-out hint denominator, CLI smoke) addresses real review issues.


What works well

  • Graph split — Receiver-failure (chained_receiver, unresolved-receiver phantom) off phantom-dst CALLS into UnresolvedCallSite + UNRESOLVED_AT; known-external JDK rows stay on CALLS with resolved=false. Matches the propose framing.
  • ucs: ids — Avoids mis-parsing embedded sym: prefixes from caller_id:line:byte.
  • MCP contract — Interleave order (line, byte, resolved before unresolved), include_unresolvededge_filter, dedup_calls preserving unresolved rows; fan-out hints use calls_row_count (true graph count), not page size.
  • OntologyFUZZY_STRATEGY_SET trimmed for CALLS; phantom/chained remain in BROWNFIELD_RESOLVER_STRATEGY_SET for HTTP/ASYNC. UNRESOLVED_AT excluded from EDGE_SCHEMA in test_schema_consistency.
  • Tests — Plan table scenarios covered; post-build invariant (zero phantom/chained_receiver on CALLS) is strong.

Suggested follow-ups (non-blocking)

1. describe(id=ucs:…) — fail-safe but not fail-loud (medium)

_resolve_node_kind returns unresolved_call_site for ucs: prefixes, but _load_node_record only handles symbol/route/client/producer; any other kind falls through to the Producer branch, so describe(ucs:…) becomes MATCH (n:Producer) … → generic "No node found".

That avoids mis-routing to Symbol, but agents won’t get the teaching message AGENT-GUIDE promises (“not describable; use caller describe / unresolved-calls”). Consider an early guard in describe_v2 and test_describe_ucs_id_not_describable.

2. README vs AGENT-GUIDE (low)

README graph section says receiver-failure sites are reachable via describe — easy to read as “describe the UCS node”. AGENT-GUIDE is clearer. Suggest: describe(method_id).unresolved_call_sites only.

3. docs/EDGE-NAVIGATION.md (low)

No mention of UnresolvedCallSite or that CALLS no longer carries receiver-failure strategies. Worth a short add for agents using that doc as the edge catalog.

4. Housekeeping when closing the effort

  • Propose in propose/completed/ still says Status: under review.
  • plans/PLAN-CALLS-NOISE.md still references root propose path / unchecked landing todos — fine until merge, then one cleanup commit.

5. Multi-origin neighbors hints (low)

unresolved_count / calls_row_count for hints only when len(origins) == 1. Multi-origin CALLS won’t get high-fanout / has-unresolved nudges — OK if intentional; one AGENT-GUIDE line if not.


Test gaps (optional hardening)

Gap Note
No describe_v2("ucs:…") test Commit message mentions describe routing; only method rollup is tested
test_hints_neighbors_calls_has_unresolved Checks template prefix only, not {k} vs unresolved_count
dedup_calls + include_unresolved together Plan defines ordering; combined path untested

Operator reminder

Merge requires full re-index at ontology 15. Default neighbors(..., ['CALLS']) row counts will drop on hot methods (expected). edge_filter.exclude_strategies: ['phantom','chained_receiver'] on CALLS becomes a no-op on fresh graphs (harmless).

…ix, tests

describe rejects ucs: ids with a teaching message; generator documents
UnresolvedCallSite storage; tighten has-unresolved hint and combined dedup+interleave tests.

Co-authored-by: Cursor <cursoragent@cursor.com>
@HumanBean17
Copy link
Copy Markdown
Owner Author

Round 2 addressed in dd1dc4e — thanks for the approve-with-follow-ups framing.

Done

Follow-up Change
1. describe(ucs:…) fail-loud Early guard + post-resolve_node_kind guard; returns teaching message (caller describe, include_unresolved, CLI). test_describe_ucs_id_not_describable added.
2. README clarity Graph section now says describe(method_id).unresolved_call_sites (not vague “use describe”).
3. EDGE-NAVIGATION.md Generator appendix “Graph storage” for UnresolvedCallSite / UNRESOLVED_AT (regenerated; CI --check passes).
4. Housekeeping Propose status → landed (PR-3). Plan file left for post-merge cleanup as suggested.
5. Multi-origin hints One AGENT-GUIDE line: high-fanout / has-unresolved hints only for single origin.
Optional tests test_hints_neighbors_calls_has_unresolved asserts full {k}; test_neighbors_dedup_calls_include_unresolved covers combined flags + ordering.

Still deferred

  • plans/PLAN-CALLS-NOISE.mdcompleted/ — will move when PR merges / effort closes (unchecked landing todos still accurate pre-merge).

Ready for merge after re-index reminder stands (ontology 15).

@HumanBean17 HumanBean17 merged commit 9af2927 into master May 20, 2026
1 check passed
@HumanBean17 HumanBean17 deleted the feat/calls-noise-unresolved-facet branch May 23, 2026 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant