From ef0e36cb4b4f39173d77b8056685442a9d637f9c Mon Sep 17 00:00:00 2001 From: Dmitry Date: Tue, 5 May 2026 21:09:08 +0000 Subject: [PATCH] propose: cross_service_resolution config flag (brownfield_only opt-in) Single config flag in .lancedb-mcp.yml, default 'auto' (no behaviour change). When set to 'brownfield_only', pass6_match_edges suppresses auto-extracted cross-service edges; only edges where both sides came from brownfield layers (@CodebaseRoute + @CodebaseClient or YAML overrides) remain cross_service. intra_service, phantom, ambiguous, and PR-E3 invariant guard all stay active. Goal: regain control before testing on real brownfield project. Auto path is preserved; flag lets user opt out of auto matching without deleting infrastructure. Reassess after real-project data. --- .../CROSS-SERVICE-RESOLUTION-FLAG-PROPOSE.md | 457 ++++++++++++++++++ 1 file changed, 457 insertions(+) create mode 100644 propose/CROSS-SERVICE-RESOLUTION-FLAG-PROPOSE.md diff --git a/propose/CROSS-SERVICE-RESOLUTION-FLAG-PROPOSE.md b/propose/CROSS-SERVICE-RESOLUTION-FLAG-PROPOSE.md new file mode 100644 index 0000000..bb11842 --- /dev/null +++ b/propose/CROSS-SERVICE-RESOLUTION-FLAG-PROPOSE.md @@ -0,0 +1,457 @@ +# Propose: `cross_service_resolution` config flag + +**Status:** Draft (2026-05-05) +**Author:** Dmitry (with Computer) +**Companion docs:** +- `propose/FEIGN-NOT-AN-EXPOSER-PROPOSE.md` (PR #25) — fixes a wrong-direction edge in the auto path; **must ship before this** +- `propose/SHARED-MODULE-MULTI-ATTRIBUTION-PROPOSE.md` (PR #24) — parked +- `propose/HTTP-CLIENT-ROLE-PROPOSE.md` — parked (not yet drafted) + +## TL;DR + +Add a single config flag `cross_service_resolution: brownfield_only | auto` +to `.lancedb-mcp.yml`, **default `auto`** (preserves today's behaviour). +When set to `brownfield_only`, `pass6_match_edges` skips +**cross-service** matching but keeps `intra_service`, `phantom`, +`unresolved`, and `ambiguous` labelling untouched. + +The motivation is **regaining control** before testing on a real +brownfield project: the existing brownfield system (Layers A/B/C +covering roles, routes, clients, async producers — verified complete +2026-05-06) already supports authoritative cross-service edges via +`@CodebaseRoute` on the consumer side + `@CodebaseClient` on the +caller side. The flag lets a user run **brownfield-first** for +maximum precision without deleting the auto path. If brownfield-only +proves too tedious on real code, flip the flag back to `auto`. + +The flag is **global** (single setting for HTTP_CALLS + ASYNC_CALLS). +Per-channel granularity is explicitly deferred. + +## Why this is a control issue (not a bug) + +Auto cross-service resolution is working as designed on greenfield +fixtures (`cross_service_smoke`, `bank-chat-system`). But: + +1. **PR #25 just discovered a wrong-direction EXPOSES bug** the auto + path was silently producing for ~3 weeks. The bug existed in code + that already had test coverage; the fix is small (~150 LOC) but + the discovery shows the auto path's correctness is not yet + battle-tested. +2. **No real-project shakedown has happened.** Every test run to + date has been against curated fixtures. Real brownfield + codebases will surface ambiguity and edge cases the matcher + wasn't designed for. +3. **Brownfield infrastructure is complete.** `@CodebaseRoute` + (consumer side) + `@CodebaseClient` with `target_service` + (caller side) already produce `cross_service` edges + automatically when both sides exist — the matcher in + `pass6_match_edges` doesn't filter by `route_source_layer`, + so layer-C routes match exactly like built-in routes. +4. **Mental model creep.** The auto path now spans PR-D3 (matcher), + PR-E2 (3-strategy ladder), PR-E3 (intra-JVM invariant), + `_lookup_method_candidates` (graph walk), and the `match` + labelling enum (5 outcomes). A user who only cares about + "annotate the legacy hot spots, get correct edges" doesn't need + any of it. + +The flag does not **delete** any of this. It lets a user **opt out** +when they want simplicity, and **opt in** when they want the +convenience. + +## Goals + +- **G1.** Single global flag in `.lancedb-mcp.yml`: + `cross_service_resolution: auto | brownfield_only`. Default `auto` + (no behaviour change unless user sets it). +- **G2.** When `brownfield_only`, `pass6_match_edges` skips + cross-service candidate scoring; edges that would have been + labelled `cross_service` from the auto path become `unresolved` + instead. Edges already produced via brownfield layers + (`route_source_layer in {layer_*}` AND + `resolution_strategy in {codebase_route, layer_*}`) remain + `cross_service`. +- **G3.** `intra_service` matching, `phantom` detection, + `ambiguous` labelling, and the PR-E3 invariant guard all remain + active in `brownfield_only` mode. They do not require microservice + topology guessing. +- **G4.** `KuzuGraph.meta()` reports the active mode so downstream + consumers can detect it (e.g., a query tool can warn "this graph + was built in `brownfield_only` mode; expect higher + `unresolved` rate"). +- **G5.** Backwards compatibility: graphs built before the flag + exists continue to load and report `cross_service_resolution` + meta as `null` (= "unknown / built before flag"). +- **G6.** Test coverage: at minimum 4 tests — auto-mode unchanged + (regression), brownfield_only suppresses cross-service auto + matches, brownfield_only preserves brownfield-sourced + cross-service edges, brownfield_only preserves intra_service. + +## Non-goals + +- **NG1.** Per-channel flags (HTTP vs async). Single flag covers + both. Split later if needed. +- **NG2.** Removing or refactoring the auto path. Deferred until + after real-project test data informs the decision. +- **NG3.** Auto-migration of existing graphs. New flag means + old graphs remain valid; the meta field is `null` for them. +- **NG4.** A CLI override (`--cross-service-resolution=auto`). + Config-file only for v1; CLI override is a 5-line follow-up if + needed. +- **NG5.** New brownfield annotations. The existing + `@CodebaseRoute` + `@CodebaseClient` cover the brownfield path + completely. + +## 1. Current state (verified 2026-05-06) + +### How auto cross-service matching works today + +`build_ast_graph.py:1620-1648` — the `_classify_match_for_call` +helper iterates `tables.routes_rows` and matches HTTP calls by +`(method, path_regex)` and async calls by `(broker, topic)`. +It **does not filter** by `route_source_layer`. So a layer-C +`@CodebaseRoute` lands in the same candidate pool as a built-in +`@GetMapping` route. + +`pass6_match_edges` (`build_ast_graph.py:1651-`) walks each +unresolved HTTP_CALL / ASYNC_CALL row, computes candidates, +and assigns `match` ∈ {`cross_service`, `intra_service`, +`ambiguous`, `phantom`, `unresolved`}. + +### Brownfield coverage today (verified) + +| Concern | Annotation | YAML | Coverage | +|---|---|---|---| +| Class role | `@CodebaseRole` | `role_overrides` | ✅ Layer A/B/C | +| Class capability | `@CodebaseCapability` | `capability_overrides` | ✅ Layer A/B/C | +| Route declaration (consumer side) | `@CodebaseRoute` / `@CodebaseRoutes` | `route_overrides` | ✅ Layer A/B/C | +| HTTP client target (caller side) | `@CodebaseClient` (target_service, path, method) | `http_client_overrides` | ✅ Layer A/B/C | +| Async producer target (caller side) | `@CodebaseAsyncProducer` (broker, topic) | `async_producer_overrides` | ✅ Layer A/B/C | + +`HttpClientHint` (`graph_enrich.py:190`) carries +`(client_kind, target_service, path, method)`. `AsyncProducerHint` +(`graph_enrich.py:198`) carries `(client_kind, topic, broker)`. +Both feed into `OutgoingCall` records that the cross-service +matcher consumes. + +**There is no missing brownfield primitive.** The flag is purely +about **whether to also run the auto matcher** when both sides +of a brownfield edge are not available. + +### Concrete example + +User writes in `svc-a`: + +```java +@FeignClient(name = "svc-b") +public interface OrdersClient { + @CodebaseClient(targetService = "svc-b", path = "/orders", method = "POST") + @PostMapping("/orders") + Order create(@RequestBody OrderRequest req); +} +``` + +User writes in `svc-b`: + +```java +@RestController +public class OrdersController { + @CodebaseRoute(microservice = "svc-b", path = "/orders", method = "POST") + @PostMapping("/orders") + public Order create(@RequestBody OrderRequest req) { ... } +} +``` + +In **both** modes (`auto` and `brownfield_only`), the +cross-service edge is produced — once via the brownfield layers, +once via the auto matcher (which agrees). In `auto` mode, the +auto matcher would have produced the same edge even without the +annotations. In `brownfield_only` mode, only annotated pairs +produce cross-service edges. + +## 2. Design + +### 2.1 Config file + +`.lancedb-mcp.yml`: + +```yaml +microservice_roots: + - svc-a + - svc-b + +cross_service_resolution: brownfield_only # auto | brownfield_only + # default: auto +``` + +`graph_enrich._load_config_microservice_roots` is the existing +config reader — extend it (or add a sibling `_load_config_flags`) +to read this single new key. Validate to one of two literal +strings; warn-and-fall-back-to-`auto` on unknown values. + +### 2.2 Threading the flag + +The flag needs to reach `pass6_match_edges`. Two options: + +- **Option (a) — pass via `GraphTables`**: add a single field + `cross_service_resolution: str = "auto"` on the tables container. + Set in `build_ast_graph.py` near `overrides = load_brownfield_overrides(...)` + (line 1274/1409/1927). `pass6_match_edges` reads + `tables.cross_service_resolution`. +- **Option (b) — pass as kwarg**: `pass6_match_edges(tables, *, verbose, cross_service_resolution)`. + Slightly cleaner separation but requires touching every caller. + +Recommend **Option (a)** — fewer signature changes, mirrors how +`overrides` already lives near pass-call sites. Single-line read +inside `pass6`. + +### 2.3 The gate in `pass6_match_edges` + +Inside `pass6_match_edges`, after computing candidates per row: + +```python +match, candidates = _classify_match_for_call(call, all_routes, caller_microservice) + +if ( + tables.cross_service_resolution == "brownfield_only" + and match == "cross_service" + and not _is_brownfield_sourced(call, candidates) +): + match = "unresolved" + candidates = [] +``` + +Where `_is_brownfield_sourced(call, candidates)` returns `True` +iff: + +- `call.resolution_strategy in {"layer_c_source", "layer_b_ann", "layer_b_fqn", "layer_a_meta"}` + (the call was emitted from a brownfield client layer), **AND** +- All candidates have `route_source_layer in + {"layer_c_source", "layer_b_ann", "layer_b_fqn", "layer_a_meta"}` + (the matched route came from a brownfield route layer). + +**Both sides must be brownfield-sourced** for the edge to count +as authoritative. A brownfield client matching an auto-extracted +route is rejected — same direction the user is pointing toward +("regain control"). In auto-mixed cases, the user can add the +counterpart annotation explicitly. + +`intra_service` is **never** suppressed regardless of +`route_source_layer` — same JVM is structural, not topological. +PR-E3's invariant guard already covers correctness here. + +### 2.4 Meta exposure + +`KuzuGraph.meta()` already returns a dict of build-time scalars. +Add one key: + +```python +{ + ... + "cross_service_resolution": "auto" | "brownfield_only" | None, + ... +} +``` + +Stored in the `GraphMeta` Kuzu node as a `STRING` column +`cross_service_resolution`. `None` for graphs built before this +flag exists (read fallback in `KuzuGraph.meta()`). + +### 2.5 Logging + +When `brownfield_only` is active, `pass6_match_edges` logs at +verbose-level: + +``` +[pass6] cross_service_resolution=brownfield_only: + N cross_service edges from brownfield layers, + M auto-cross-service candidates suppressed → unresolved +``` + +The user can use this to detect "I should annotate more places" +or "auto would have caught this; flip the flag". + +## 3. Risks and mitigations + +| # | Risk | Likelihood | Mitigation | +|---|---|---|---| +| 1 | A brownfield-sourced call matches an auto-sourced route, user expects edge | Medium | Both sides must be brownfield-sourced. Document this explicitly. If user wants cross-layer matching, they flip to `auto`. | +| 2 | `unresolved` rate spikes silently when user flips the flag | Medium | Verbose log line in §2.5. Also expose in `KuzuGraph.meta()` so query tools can warn. | +| 3 | Default `auto` masks the new flag's value | Low | Document the flag in README's brownfield section. Add a one-line note in `KuzuGraph.meta()` docstring. | +| 4 | Test fixtures (`cross_service_smoke`, `bank-chat-system`) don't exercise `brownfield_only` mode | High | Add at least 2 fixture-level tests in `brownfield_only` mode (see §4). | +| 5 | Old graphs lack the meta field | Low | `KuzuGraph.meta()` returns `None` when missing. Documented in §2.4. | +| 6 | User confuses "unresolved" with "broken" | Medium | The verbose log distinguishes "suppressed by mode" from genuinely unresolved. Optionally add a new `match` value `suppressed_by_mode` (alternative to `unresolved`); deferred to [TBD-3]. | +| 7 | Flag interacts unexpectedly with PR #25's Feign-not-an-exposer fix | Low | #25 fix runs in `pass4_routes` (route emission), not `pass6` (matching). They're orthogonal. Test by building both fixes together. | + +## 4. Verification plan + +### Tests + +| # | Test | Asserts | +|---|---|---| +| 1 | `test_cross_service_resolution_auto_default` | Without `cross_service_resolution` set, behaviour matches today's `cross_service_smoke` baseline (1 cross-service HTTP_CALL + 1 cross-service ASYNC_CALL) | +| 2 | `test_brownfield_only_suppresses_auto_cross_service` | `cross_service_resolution: brownfield_only` on `cross_service_smoke` (no `@CodebaseRoute` / `@CodebaseClient` annotations) → 0 `cross_service` edges; the would-be edges become `unresolved` | +| 3 | `test_brownfield_only_keeps_annotated_cross_service` | Add `@CodebaseRoute` to `JoinControllerB` and `@CodebaseClient` to `BFeignClient` in fixture → `brownfield_only` mode produces 1 `cross_service` edge | +| 4 | `test_brownfield_only_preserves_intra_service` | Build a fixture with intra-JVM CALLS in `brownfield_only` mode → all `intra_service` edges still present (PR-E3 invariant holds) | +| 5 | `test_meta_reports_cross_service_resolution` | Built-with-flag graph → `KuzuGraph.meta()["cross_service_resolution"]` returns the literal value | +| 6 | `test_meta_resolution_null_for_old_graphs` | Synthesize a graph without the `GraphMeta.cross_service_resolution` column → `KuzuGraph.meta()["cross_service_resolution"]` returns `None` | + +### Manual evidence + +```bash +cd /home/user/workspace/user-rag + +# Auto mode (default, unchanged) +rm -rf /tmp/check_auto && \ + python build_ast_graph.py --source-root tests/fixtures/cross_service_smoke \ + --kuzu-path /tmp/check_auto --verbose 2>&1 | grep -E 'cross_service|brownfield_only' + +# Expected: +# [pass6] http_calls cross_service=1, intra_service=0, ambiguous=0, phantom=0, unresolved=0 +# [pass6] async_calls cross_service=1, ... + +# Brownfield-only mode (annotate fixture or run unannotated for negative case) +echo "cross_service_resolution: brownfield_only" > tests/fixtures/cross_service_smoke/.lancedb-mcp.yml +rm -rf /tmp/check_bo && \ + python build_ast_graph.py --source-root tests/fixtures/cross_service_smoke \ + --kuzu-path /tmp/check_bo --verbose 2>&1 | grep -E 'cross_service|brownfield_only' + +# Expected (without annotations): +# [pass6] cross_service_resolution=brownfield_only: 0 brownfield, 2 auto-cross-service suppressed +``` + +### Determinism + +Two builds in the same mode produce identical sorted `(caller, route, match)` +triples. (Existing determinism property; the flag does not introduce +non-determinism.) + +## 5. Suggested PR scope + +Single PR, ~150 LOC: + +1. `graph_enrich.py`: add `_load_config_cross_service_resolution(project_root_str) -> str` + reading `.lancedb-mcp.yml`. Cached per root, mirroring + `_load_config_microservice_roots`. +2. `build_ast_graph.py`: add `cross_service_resolution: str = "auto"` + field on `GraphTables`. Set near `overrides = load_brownfield_overrides(...)`. +3. `build_ast_graph.py`: add `_is_brownfield_sourced(call, candidates)` + helper. Add the gate inside `pass6_match_edges` after each + candidate-match call site (HTTP and async loops both — ~6 LOC each). +4. `build_ast_graph.py`: extend `GraphMeta` schema with + `cross_service_resolution STRING` (nullable). Bump + `ONTOLOGY_VERSION` 7→8. +5. `kuzu_queries.py`: extend `KuzuGraph.meta()` to read the new + column with `None` fallback for old graphs. +6. `tests/`: 6 tests above. +7. Verbose log line per §2.5. +8. README brownfield section: document the flag. + +**No** new MCP tools, no new annotation types, no new +client/route extraction logic. + +## 6. Backwards compatibility + +- Old `.lancedb-mcp.yml` files without the flag → behaviour unchanged + (default `auto`). +- Old graphs built before the flag → `KuzuGraph.meta()` returns + `None` for the field; queries that don't read it are unaffected. +- Brownfield annotations themselves (`@CodebaseRoute`, `@CodebaseClient`) + unchanged. They still produce edges in both modes. +- No data migration required. Rebuild to apply the flag. +- Ontology bump 7→8 documented in the PR description. + +## 7. Why not the alternatives + +### Why not delete the auto path entirely (the "rollback" option)? + +- Loses ~3 weeks of working infrastructure (PR-D3, PR-E2, PR-E3). +- Greenfield codebases would require annotation burden they don't + need today. +- The decision should be data-driven (after real-project test), + not preemptive. +- Reversible: if `brownfield_only` is what the user always uses, + a v2 cleanup PR can delete the auto path. The flag preserves + optionality. + +### Why not per-channel flags (HTTP vs async)? + +- Async path is structurally cleaner (verified in PR #25 §8 audit: + no symmetric Feign-style bug). Per-channel is overengineered for + v1. +- Trivial to split later (one literal becomes two). +- Single flag is one config knob; two flags is a matrix. + +### Why not a CLI override? + +- 95% of users will set this once per project, in `.lancedb-mcp.yml`. +- A CLI override is a 5-line follow-up if needed (read `argv` first, + fall back to YAML). +- Keeping config in one place avoids "which mode am I in?" confusion. + +### Why both sides must be brownfield-sourced? + +- A brownfield client matching an auto-extracted route would still + rely on the auto path (path-regex matching, microservice topology + inference). The whole point of `brownfield_only` is to **not** + trust the auto path. +- Forces the user to explicitly annotate both sides if they want + the edge — matches the "regain control" intent. +- If this proves too strict in practice, [TBD-2] flags it for v2 + relaxation. + +### Why not a new `match` value `suppressed_by_mode`? + +- Adds enum surface that downstream queries must handle. +- Same effect achievable via `match=unresolved` + verbose log + (the user knows the mode from `meta()`). +- Deferred as [TBD-3] — easy to add later if multiple consumers + ask for it. + +## 8. Future work (out of scope) + +- **CLI override** for the flag — 5-line follow-up. +- **Per-channel granularity** — `cross_service_http_resolution` + + `cross_service_async_resolution` if real use surfaces the need. +- **Mixed mode** — allow brownfield client + auto route (or vice + versa). Currently rejected; revisit if real-project testing + shows this is the common case. +- **`suppressed_by_mode` match value** — distinguish mode-suppressed + edges from genuinely unresolved ones in the graph itself. +- **Auto-suggest annotations** — when `brownfield_only` suppresses + an edge that the auto path would have caught, log a hint: + "would have matched svc-b's `POST /orders`; consider + `@CodebaseClient(targetService='svc-b', path='/orders', method='POST')`". + +## 9. [TBD] + +| # | Decision | Notes | +|---|----------|-------| +| 1 | YAML key naming: `cross_service_resolution` vs `cross_service_matching` vs `cross_service_mode` | Recommend `cross_service_resolution` — matches existing terminology in `_classify_match_for_call` and the matcher module. | +| 2 | Should brownfield client + auto route still produce `cross_service` edge? | Recommend NO for v1 (matches "regain control" intent). Revisit after real-project test if user finds this too strict. | +| 3 | Should suppressed edges use `unresolved` or a new `suppressed_by_mode` match value? | Recommend `unresolved` for v1 (smaller enum surface). Add new value if downstream queries need to distinguish. | +| 4 | Verbose log threshold: log every suppressed edge or just the count? | Recommend count + first 5 examples. Avoids log spam on large graphs. | +| 5 | Should `KuzuGraph.meta()` return `None` or `"auto"` (the default) for old graphs? | Recommend `None` to distinguish "explicitly auto" from "didn't have the flag". | +| 6 | Should we also add a `microservices_topology_known: bool` meta? | Out of scope. Today's microservice list comes from `microservice_roots` in YAML, which is separate from this flag. | + +## 10. References + +- `build_ast_graph.py:1620-1648` — `_classify_match_for_call`, + the function that does cross-service candidate matching + (does NOT filter by `route_source_layer`) +- `build_ast_graph.py:1651-` — `pass6_match_edges`, the gate + point for the new flag +- `build_ast_graph.py:1274,1409,1927` — three call sites that + load `BrownfieldOverrides`; sibling location for the new + config flag read +- `graph_enrich.py:91` — `CONFIG_FILENAMES` +- `graph_enrich.py:115-166` — `_load_config_microservice_roots`, + the pattern to mirror for `_load_config_cross_service_resolution` +- `graph_enrich.py:190-203` — `HttpClientHint` and `AsyncProducerHint` + carrying brownfield client target info +- `graph_enrich.py:838,919,952,981,995,1033,1267,1358` — the + `route_source_layer` and `resolution_strategy` values that + `_is_brownfield_sourced` must check against +- `propose/FEIGN-NOT-AN-EXPOSER-PROPOSE.md` — sibling fix that + must ship first; orthogonal (different pass) +- `plans/PLAN-POST-TIER1B-FOLLOWUPS.md` § PR-E3 — the intra-JVM + invariant guard, which remains active in `brownfield_only` mode