Skip to content

Fix vector index Buffer serialization#653

Open
NikolaiL wants to merge 1 commit into
rohitg00:mainfrom
NikolaiL:fix/vector-index-buffer-slices
Open

Fix vector index Buffer serialization#653
NikolaiL wants to merge 1 commit into
rohitg00:mainfrom
NikolaiL:fix/vector-index-buffer-slices

Conversation

@NikolaiL
Copy link
Copy Markdown

@NikolaiL NikolaiL commented May 25, 2026

Summary

Fix vector index serialization/deserialization so persisted Float32Array embeddings preserve their intended byte range instead of using the entire backing ArrayBuffer.

Why

Buffer.from(arr.buffer) serializes the entire backing buffer, ignoring Float32Array.byteOffset and byteLength. Similarly, new Float32Array(Buffer.from(b64, "base64").buffer) can read the entire pooled Buffer backing store instead of just the decoded bytes.

In a live AgentMemory instance this caused a rebuilt 384-dimension local embedding index to be persisted, then rejected on restart as 2048-dimensional vectors:

Refusing to start: persisted vector index has 6471 of 6471 vectors with the wrong dimension.
Active provider (local) declares 384; dimensions seen on disk: 2048.

The service then refused to start until the installed package was patched locally or the vector index was discarded.

Changes

  • Serialize only the active Float32Array byte range.
  • Deserialize using the decoded Buffer's byteOffset and byteLength.
  • Add regression tests for pooled Buffer-backed deserialization and sliced Float32Array serialization.

Verification

npx vitest run test/vector-index.test.ts test/index-persistence.test.ts
npm run build

Summary by CodeRabbit

  • Bug Fixes

    • Fixed vector embedding serialization and deserialization to correctly handle array slices, ensuring data integrity when working with vector indices.
  • Tests

    • Added test cases to verify vector embedding serialization preserves dimensions and search functionality after deserialization.

Review Change Stack

@vercel
Copy link
Copy Markdown

vercel Bot commented May 25, 2026

Someone is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 67b7d8ac-a63e-45bd-b240-bc8bdb169057

📥 Commits

Reviewing files that changed from the base of the PR and between bfb5e66 and ac68491.

📒 Files selected for processing (2)
  • src/state/vector-index.ts
  • test/vector-index.test.ts

📝 Walkthrough

Walkthrough

The PR fixes float32ToBase64 and base64ToFloat32 to correctly serialize and deserialize Float32Array views that have byte offsets within pooled buffers. New tests validate that embeddings round-trip correctly and searches return expected observation IDs.

Changes

Float32Array Serialization Codec

Layer / File(s) Summary
Float32Array view-aware serialization codec
src/state/vector-index.ts
float32ToBase64 now encodes only the active slice using byteOffset and byteLength. base64ToFloat32 constructs Float32Array from the decoded buffer's buffer, byteOffset, and scaled byteLength.
Serialization round-trip validation tests
test/vector-index.test.ts
New tests verify embeddings from pooled buffers serialize/deserialize correctly, pass dimension validation, and searches return expected observation IDs.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related issues

  • #584: Directly addresses the float32ToBase64/base64ToFloat32 bug with pooled-buffer phantom lengths by using byteOffset and byteLength.
  • #587: Related bug fix for the same serialization issue in Float32Array views with byte offsets.

Poem

🐰 Offsets were lost in the buffer's expanse,
Pooled arrays danced beyond their true slice,
Now byteOffset and byteLength advance,
Each view encodes just right, not quite twice.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix vector index Buffer serialization' directly summarizes the main change: fixing how Float32Array embeddings are serialized and deserialized to handle byte offsets and lengths correctly.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant