fix(graph): tolerate reordered XML attrs on entity + relationship (closes #635)#685
Conversation
…oses #635) Previous parser hard-coded attribute order: type-before-name on <entity>, type/source/target/weight on <relationship>. Codex CLI's LLM frequently emits attrs in a different order (name first, source/target before type), and any deviation caused the whole tag to silently drop — graph extraction returned 0 nodes for an entire run. Replaced both regexes with an order-independent two-pass approach: - Tiny parseAttrs() reads any key="value" pair in any order - entitySelfClose + entityWithBody patterns separate the two tag shapes (preserving the #494 fix where greedy attr-matching swallowed trailing / and merged adjacent entities) - Relationship matcher now reads source/target/type/weight from parsed attrs, falls back to weight=0.5 on missing or non-finite, requires type+source+target present to emit the edge Regression test: <entity name=... type=...> + <relationship source=... target=... type=...> shape that previously dropped to zero nodes/edges. 7/7 graph tests pass.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughThe graph XML parser is refactored to tolerate reordered attributes by extracting key-value pairs dynamically rather than assuming fixed attribute order. Entity and relationship parsing now use separate regex patterns with helper functions to build node records and validate relationship weights with defaults and clamping. A new test verifies the parser handles attribute order flexibility. ChangesGraph XML Parser Robustness
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/functions/graph.ts (1)
18-23: ⚡ Quick winRemove implementation-descriptive comments in
srcparser block.These comments explain what the code does; please replace with clearer naming/tests and keep comments only for non-obvious intent.
As per coding guidelines "Avoid code comments explaining WHAT — use clear naming instead".
Also applies to: 45-49
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/functions/graph.ts` around lines 18 - 23, The block comment in src/functions/graph.ts that describes the parser's implementation (how it parses key="value" pairs and why it was changed) should be removed; instead, rename the parser utilities (e.g., parseAttributes, parseEntity, parseRelationship) and their parameters to be self-explanatory and add unit tests that demonstrate expected behavior (attribute order independence and handling of missing fields) so the intent is captured by names/tests rather than an implementation-comment; retain only brief comments for non-obvious intent or edge-case rationale near the relevant functions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/functions/graph.ts`:
- Around line 18-23: The block comment in src/functions/graph.ts that describes
the parser's implementation (how it parses key="value" pairs and why it was
changed) should be removed; instead, rename the parser utilities (e.g.,
parseAttributes, parseEntity, parseRelationship) and their parameters to be
self-explanatory and add unit tests that demonstrate expected behavior
(attribute order independence and handling of missing fields) so the intent is
captured by names/tests rather than an implementation-comment; retain only brief
comments for non-obvious intent or edge-case rationale near the relevant
functions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ea4dc7d8-95f5-4f9d-a229-947d37d38e30
📒 Files selected for processing (2)
src/functions/graph.tstest/graph.test.ts
Closes #635.
Bug
parseGraphXmlinsrc/functions/graph.tshard-coded attribute order on both tags:<entity type="..." name="...">— type must come first<relationship type="..." source="..." target="..." weight="..."/>— strict left-to-rightCodex CLI's LLM commonly emits
namebeforetype, orsource/targetbeforetypeon relationships. Any deviation made the regex miss the tag entirely, silently dropping the node/edge. End-user symptom: a full graph-extract run completes withnodesAdded: 0.Fix
Two-pass extraction with a tiny
parseAttrs()helper that reads anykey="value"pair in any order:entitySelfClose(<entity ... />) +entityWithBody(<entity ...>...</entity>) patterns separate the two tag shapes. Preserves the fix: parse self-closing graph entities #494 fix where greedy attr-matching swallowed the trailing/and merged adjacent entities.source/target/type/weightfrom parsed attrs. Falls back to0.5when weight is missing or non-finite; requirestype+source+targetpresent to emit the edge (silent drop on incomplete tags, as before).Tests
New case in
test/graph.test.tsfeeds the Codex-style reordered shape (<entity name=... type=...>+<relationship source=... target=... type=... weight=...>) and asserts node count + edge type + weight round-trip. All 7 graph tests pass.Summary by CodeRabbit