[codex] Fix graph XML parsing for reordered attributes#635
Conversation
|
@Neoplayer is attempting to deploy a commit to the rohitg00's projects Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughWalkthroughUpdated ChangesXML Parsing Refactor
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/graph.test.ts (1)
152-159: ⚡ Quick winAssert relationship endpoints, not just edge type/count.
Current checks can still pass if source/target are miswired. Add endpoint and weight assertions to lock in attribute-map correctness.
Proposed test hardening
const nodes = await kv.list<GraphNode>("mem:graph:nodes"); expect(nodes.some((n) => n.name === "src/index.ts")).toBe(true); expect(nodes.some((n) => n.name === "main")).toBe(true); + const fileNode = nodes.find((n) => n.name === "src/index.ts"); + const mainNode = nodes.find((n) => n.name === "main"); + expect(fileNode).toBeDefined(); + expect(mainNode).toBeDefined(); const edges = await kv.list<GraphEdge>("mem:graph:edges"); expect(edges).toHaveLength(1); expect(edges[0].type).toBe("uses"); + expect(edges[0].sourceNodeId).toBe(fileNode!.id); + expect(edges[0].targetNodeId).toBe(mainNode!.id); + expect(edges[0].weight).toBe(0.9); });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/graph.test.ts` around lines 152 - 159, The test currently only asserts edge count and type so add assertions that verify the edge's endpoints and weight to prevent miswiring: after fetching edges (edges variable of type GraphEdge) assert edges[0].source and edges[0].target match the expected node identifiers (or node names found in nodes list, e.g., the GraphNode entries with name "src/index.ts" and "main"), and assert edges[0].weight equals the expected numeric value; update the test block in test/graph.test.ts that references nodes and edges to include these endpoint and weight checks so the relationship attribute map is fully validated.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/graph.test.ts`:
- Around line 152-159: The test currently only asserts edge count and type so
add assertions that verify the edge's endpoints and weight to prevent miswiring:
after fetching edges (edges variable of type GraphEdge) assert edges[0].source
and edges[0].target match the expected node identifiers (or node names found in
nodes list, e.g., the GraphNode entries with name "src/index.ts" and "main"),
and assert edges[0].weight equals the expected numeric value; update the test
block in test/graph.test.ts that references nodes and edges to include these
endpoint and weight checks so the relationship attribute map is fully validated.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9f515adb-3150-438e-a083-2b376efe5b66
📒 Files selected for processing (2)
src/functions/graph.tstest/graph.test.ts
…oses #635) (#685) Previous parser hard-coded attribute order: type-before-name on <entity>, type/source/target/weight on <relationship>. Codex CLI's LLM frequently emits attrs in a different order (name first, source/target before type), and any deviation caused the whole tag to silently drop — graph extraction returned 0 nodes for an entire run. Replaced both regexes with an order-independent two-pass approach: - Tiny parseAttrs() reads any key="value" pair in any order - entitySelfClose + entityWithBody patterns separate the two tag shapes (preserving the #494 fix where greedy attr-matching swallowed trailing / and merged adjacent entities) - Relationship matcher now reads source/target/type/weight from parsed attrs, falls back to weight=0.5 on missing or non-finite, requires type+source+target present to emit the edge Regression test: <entity name=... type=...> + <relationship source=... target=... type=...> shape that previously dropped to zero nodes/edges. 7/7 graph tests pass.
Summary
This PR makes
mem::graph-extracttolerate XML attributes in any order for both<entity>and<relationship>tags.Problem
The graph extraction prompt asks an LLM to return XML like this:
The current parser accepts self-closing entity tags, but it still assumes a fixed attribute order:
type="..."followed byname="..."type="...", thensource="...", thentarget="...", thenweight="..."Real model output does not always preserve that order. For example, Qwen/OpenRouter can emit valid XML like:
That output is semantically valid, but the parser silently drops it. The API call can return
success: truewithnodesAdded: 0andedgesAdded: 0, leaving the dashboard in the "No graph data yet" state even whenGRAPH_EXTRACTION_ENABLED=trueand the LLM provider is working.Fix
The parser now extracts XML attributes into a small key/value map before reading
type,name,source,target, andweight. This keeps support for:It also defaults missing relationship
weightto the existing fallback of0.5, while continuing to skip malformed tags that lack required fields.Validation
npm test -- test/graph.test.ts.npm run build.Both checks pass locally.
Summary by CodeRabbit