Skip to content

[codex] Fix graph XML parsing for reordered attributes#635

Closed
Neoplayer wants to merge 1 commit into
rohitg00:mainfrom
Neoplayer:codex/fix-graph-xml-attribute-order
Closed

[codex] Fix graph XML parsing for reordered attributes#635
Neoplayer wants to merge 1 commit into
rohitg00:mainfrom
Neoplayer:codex/fix-graph-xml-attribute-order

Conversation

@Neoplayer
Copy link
Copy Markdown

@Neoplayer Neoplayer commented May 24, 2026

Summary

This PR makes mem::graph-extract tolerate XML attributes in any order for both <entity> and <relationship> tags.

Problem

The graph extraction prompt asks an LLM to return XML like this:

<entity type="file" name="src/index.ts"/>
<relationship type="uses" source="src/index.ts" target="main" weight="0.9"/>

The current parser accepts self-closing entity tags, but it still assumes a fixed attribute order:

  • entities must be type="..." followed by name="..."
  • relationships must be type="...", then source="...", then target="...", then weight="..."

Real model output does not always preserve that order. For example, Qwen/OpenRouter can emit valid XML like:

<entity name="src/index.ts" type="file"/>
<relationship target="main" weight="0.9" source="src/index.ts" type="uses"/>

That output is semantically valid, but the parser silently drops it. The API call can return success: true with nodesAdded: 0 and edgesAdded: 0, leaving the dashboard in the "No graph data yet" state even when GRAPH_EXTRACTION_ENABLED=true and the LLM provider is working.

Fix

The parser now extracts XML attributes into a small key/value map before reading type, name, source, target, and weight. This keeps support for:

  • explicit entity tags with properties
  • self-closing entity tags
  • relationship tags
  • arbitrary attribute ordering

It also defaults missing relationship weight to the existing fallback of 0.5, while continuing to skip malformed tags that lack required fields.

Validation

  • Added a regression test covering reordered attributes on both entity and relationship tags.
  • Ran npm test -- test/graph.test.ts.
  • Ran npm run build.

Both checks pass locally.

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced XML parsing to correctly handle entity and relationship attributes regardless of their order in the document.
    • Improved graph extraction robustness by supporting multiple XML tag formats.
  • Tests
    • Added test coverage to verify XML parsing handles attributes in any order.

Review Change Stack

@vercel
Copy link
Copy Markdown

vercel Bot commented May 24, 2026

@Neoplayer is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

@Neoplayer Neoplayer marked this pull request as ready for review May 24, 2026 14:47
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 24, 2026

📝 Walkthrough

Walkthrough

Updated parseGraphXml to use attribute-parsing helper functions and targeted regexes that separately handle entity and relationship tags, eliminating prior greedy matching and improving attribute order flexibility. Added test coverage for reordered-attribute XML.

Changes

XML Parsing Refactor

Layer / File(s) Summary
Attribute parsing helpers and node addition
src/functions/graph.ts
Introduced parseAttributes and addNode helper functions to extract key/value pairs from attribute strings, extract type/name from entity tags, parse nested properties, and skip entities missing required attributes.
Entity and relationship parsing refactor
src/functions/graph.ts
Rewrote entity parsing to separately match full <entity>...</entity> blocks and self-closing <entity .../> tags, replacing greedy patterns. Refactored relationship parsing to extract attributes from self-closing <relationship .../> tags and derive type, source, target, and clamped weight.
Attribute order robustness test
test/graph.test.ts
Added test verifying graph-extract parses XML where entity and relationship attributes appear in any order, asserting expected node/edge counts and edge type.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • rohitg00/agentmemory#494: Previous work on parseGraphXml to handle self-closing entity tags during node extraction, now further refined with comprehensive attribute parsing helpers.

Poem

A rabbit hops through XML trees,
Parsing tags in any order with ease—
No greedy bounds to cause a mess,
Attributes dance, order-less,
Graph extraction: a fuzzy success! 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly summarizes the main change: fixing graph XML parsing to handle reordered attributes, which is exactly what the changeset accomplishes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/graph.test.ts (1)

152-159: ⚡ Quick win

Assert relationship endpoints, not just edge type/count.

Current checks can still pass if source/target are miswired. Add endpoint and weight assertions to lock in attribute-map correctness.

Proposed test hardening
     const nodes = await kv.list<GraphNode>("mem:graph:nodes");
     expect(nodes.some((n) => n.name === "src/index.ts")).toBe(true);
     expect(nodes.some((n) => n.name === "main")).toBe(true);
+    const fileNode = nodes.find((n) => n.name === "src/index.ts");
+    const mainNode = nodes.find((n) => n.name === "main");
+    expect(fileNode).toBeDefined();
+    expect(mainNode).toBeDefined();

     const edges = await kv.list<GraphEdge>("mem:graph:edges");
     expect(edges).toHaveLength(1);
     expect(edges[0].type).toBe("uses");
+    expect(edges[0].sourceNodeId).toBe(fileNode!.id);
+    expect(edges[0].targetNodeId).toBe(mainNode!.id);
+    expect(edges[0].weight).toBe(0.9);
   });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/graph.test.ts` around lines 152 - 159, The test currently only asserts
edge count and type so add assertions that verify the edge's endpoints and
weight to prevent miswiring: after fetching edges (edges variable of type
GraphEdge) assert edges[0].source and edges[0].target match the expected node
identifiers (or node names found in nodes list, e.g., the GraphNode entries with
name "src/index.ts" and "main"), and assert edges[0].weight equals the expected
numeric value; update the test block in test/graph.test.ts that references nodes
and edges to include these endpoint and weight checks so the relationship
attribute map is fully validated.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/graph.test.ts`:
- Around line 152-159: The test currently only asserts edge count and type so
add assertions that verify the edge's endpoints and weight to prevent miswiring:
after fetching edges (edges variable of type GraphEdge) assert edges[0].source
and edges[0].target match the expected node identifiers (or node names found in
nodes list, e.g., the GraphNode entries with name "src/index.ts" and "main"),
and assert edges[0].weight equals the expected numeric value; update the test
block in test/graph.test.ts that references nodes and edges to include these
endpoint and weight checks so the relationship attribute map is fully validated.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9f515adb-3150-438e-a083-2b376efe5b66

📥 Commits

Reviewing files that changed from the base of the PR and between 3551241 and 4a5bdc7.

📒 Files selected for processing (2)
  • src/functions/graph.ts
  • test/graph.test.ts

rohitg00 added a commit that referenced this pull request May 27, 2026
…oses #635) (#685)

Previous parser hard-coded attribute order: type-before-name on <entity>, type/source/target/weight on <relationship>. Codex CLI's LLM frequently emits attrs in a different order (name first, source/target before type), and any deviation caused the whole tag to silently drop — graph extraction returned 0 nodes for an entire run.

Replaced both regexes with an order-independent two-pass approach:
- Tiny parseAttrs() reads any key="value" pair in any order
- entitySelfClose + entityWithBody patterns separate the two tag shapes (preserving the #494 fix where greedy attr-matching swallowed trailing / and merged adjacent entities)
- Relationship matcher now reads source/target/type/weight from parsed attrs, falls back to weight=0.5 on missing or non-finite, requires type+source+target present to emit the edge

Regression test: <entity name=... type=...> + <relationship source=... target=... type=...> shape that previously dropped to zero nodes/edges. 7/7 graph tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant