fix(vector-index): respect Buffer slice metadata in base64 round-trip#585
fix(vector-index): respect Buffer slice metadata in base64 round-trip#585kek-Sec wants to merge 1 commit into
Conversation
Buffer.from(b64, "base64") may return a slice of Node's internal pool (poolSize=8192). Calling new Float32Array(buf.buffer) ignores byteOffset and byteLength, producing a 2048-element view over the whole pool instead of the actual decoded slice — the root cause of the phantom "dimensions seen on disk: 2048" crashes reported in rohitg00#455 and rohitg00#469. Pass byteOffset/byteLength to both helpers so the Float32Array covers only the decoded bytes. Add a regression test that serializes and deserialises 384-dim vectors (1536 bytes, within pool threshold) and asserts correct dimension and nearest-neighbour identity after reload. Fixes rohitg00#584 Signed-off-by: George Petrakis <g.petrakis@natechbanking.com>
|
@kek-Sec is attempting to deploy a commit to the rohitg00's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR fixes a bug in vector serialization where ChangesVector serialization buffer offset fix
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Confirming this fix works end-to-end. I applied the exact same The regression test in Thanks @kek-Sec for turning this into a proper PR. Hope @rohitg00 can get this merged + a 0.9.22 cut soon — every day this sits unmerged is another |
|
also closes #587 |
|
Verified the same fix lands in #653 with Recommending #653 as the merge target for the named-constant clarity. Your 5-vector round-trip + validateDimensions test is more thorough than the one in #653 though — would be great as a follow-up PR after #653 lands. Thanks @kek-Sec. |
Buffer.from(b64, "base64") may return a slice of Node's internal pool (poolSize=8192). Calling new Float32Array(buf.buffer) ignores byteOffset and byteLength, producing a 2048-element view over the whole pool instead of the actual decoded slice — the root cause of the phantom "dimensions seen on disk: 2048" crashes reported in #455 and #469.
Pass byteOffset/byteLength to both helpers so the Float32Array covers only the decoded bytes. Add a regression test that serializes and deserialises 384-dim vectors (1536 bytes, within pool threshold) and asserts correct dimension and nearest-neighbour identity after reload.
Fixes #584
Summary by CodeRabbit
Release Notes
Bug Fixes
Tests