Skip to content

Remove synthetic/structured data generation from diskann-providers#904

Draft
Copilot wants to merge 2 commits intomainfrom
copilot/remove-synthetic-data-from-diskann-providers
Draft

Remove synthetic/structured data generation from diskann-providers#904
Copilot wants to merge 2 commits intomainfrom
copilot/remove-synthetic-data-from-diskann-providers

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 6, 2026

diskann-providers contained grid point generation and synthetic label generation that already have replacements elsewhere: diskann::graph::test::synthetic::Grid and diskann-tools::utils::generate_synthetic_labels_utils respectively.

Deleted

  • diskann-providers/src/utils/generate_structured_data.rs — 1D/3D/4D grid vectors, adjacency lists, circle vectors
  • diskann-providers/src/utils/generate_synthetic_labels_utils.rs — zipf/random/one_per_point label generation

Grid data migration

All test consumers now use the Grid enum from diskann::graph::test::synthetic:

// Before
let mut adjacency_lists = utils::genererate_3d_grid_adj_list(grid_size as u32);
let mut vectors = f32::generate_grid(dim, grid_size); // called per-type generate_*d_grid_vectors_*

// After
let grid = dim_to_grid(dim);
let mut adjacency_lists = grid.neighbors(grid_size);
let mut vectors = f32::generate_grid(dim, grid_size); // now delegates to Grid::data() internally

GenerateGrid trait impls now wrap Grid::data() / Grid::data_as() instead of the deleted functions. Added diskann = { features = ["testing"] } to dev-dependencies to access the Grid type.

Synthetic labels migration

diskann-tools/src/bin/generate_synthetic_labels.rs now uses diskann_tools::utils::generate_labels with FileStorageProvider instead of the removed diskann_providers version.

- Remove generate_structured_data.rs: grid generation now uses
  diskann::graph::test::synthetic::Grid from the core diskann crate
- Remove generate_synthetic_labels_utils.rs: the binary now uses
  the implementation already in diskann-tools
- Update all consumers (diskann_async tests, debug_provider tests,
  caching example tests) to use Grid enum for data/neighbors
- Update generate_synthetic_labels binary to use diskann-tools'
  generate_labels with FileStorageProvider

Agent-Logs-Url: https://github.com/microsoft/DiskANN/sessions/8b939579-a0bf-4896-b561-0c6e819be001

Co-authored-by: harsha-simhadri <5590673+harsha-simhadri@users.noreply.github.com>
Copilot AI changed the title [WIP] Remove synthetic data generation from diskann-providers Remove synthetic/structured data generation from diskann-providers Apr 6, 2026
Copilot AI requested a review from harsha-simhadri April 6, 2026 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

move/Remove Synthetic/structured data from diskann-providers

2 participants