Skip to content

fix(udf): validate query vector dimensions before computing distances#15

Merged
anoop-narang merged 1 commit intomainfrom
fix/udf-dimension-validation
Mar 26, 2026
Merged

fix(udf): validate query vector dimensions before computing distances#15
anoop-narang merged 1 commit intomainfrom
fix/udf-dimension-validation

Conversation

@anoop-narang
Copy link
Copy Markdown
Collaborator

Summary

  • Distance UDFs (l2_distance, cosine_distance, negative_dot_product) used zip() to pair column and query vectors, silently truncating to the shorter length on dimension mismatch — returning wrong results instead of an error
  • This was especially dangerous on the UDF brute-force fallback path (SELECT * with split providers), where USearch dimension validation is bypassed entirely
  • Added dimension checks: single upfront comparison for FixedSizeList, per-row for variable-length List/LargeList

Test plan

  • udf_dimension_mismatch_select_star — key regression test: SELECT * with wrong-dim query on split providers (UDF path), now errors instead of silently truncating
  • udf_dimension_mismatch_fewer / udf_dimension_mismatch_more — fewer/more dims than column, caught by optimizer path
  • exec_split_provider_* — split-provider tests covering production topology (scan has vector, lookup doesn't)
  • exec_split_provider_768dim_negative_dot_product — 768-dim IP-metric end-to-end with split providers
  • All 21 execution tests pass, fmt/clippy clean

Distance UDFs (l2_distance, cosine_distance, negative_dot_product) used
zip() to iterate over column and query vectors, silently truncating to
the shorter length on dimension mismatch. This returned wrong results
instead of an error — especially dangerous on the brute-force fallback
path (SELECT *) where USearch dimension validation is bypassed.

For FixedSizeList: single upfront check (zero per-row cost).
For List/LargeList: per-row check (one branch, negligible vs kernel).
@anoop-narang anoop-narang merged commit 6028e79 into main Mar 26, 2026
5 checks passed
@anoop-narang anoop-narang deleted the fix/udf-dimension-validation branch March 26, 2026 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant