Add SharedDistSamplingBackend — multi-channel sampling backend#577
Draft
kmontemayor2-sc wants to merge 9 commits intomainfrom
Draft
Add SharedDistSamplingBackend — multi-channel sampling backend#577kmontemayor2-sc wants to merge 9 commits intomainfrom
kmontemayor2-sc wants to merge 9 commits intomainfrom
Conversation
…r.py Move create_dist_sampler(), SamplerInput, and SamplerRuntime out of dist_sampling_producer.py into a shared utils module so they can be reused by the upcoming SharedDistSamplingBackend. Also rename `w` -> `worker` in DistSamplingProducer.init() for clarity. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduce SharedDistSamplingBackend which manages a pool of worker processes servicing multiple compute-rank channels through a fair-queued round-robin scheduler. This replaces the per-channel producer model in graph-store mode with a shared backend + lightweight per-channel state. Includes tests for pure business logic helpers (_compute_num_batches, _epoch_batch_indices, _compute_worker_seeds_ranges), shuffle behavior, and completion reporting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Collaborator
Author
|
/all_test |
Contributor
GiGL Automation@ 21:04:57UTC : 🔄 @ 22:26:32UTC : ✅ Workflow completed successfully. |
Contributor
GiGL Automation@ 21:04:57UTC : 🔄 @ 21:13:43UTC : ✅ Workflow completed successfully. |
Contributor
GiGL Automation@ 21:04:58UTC : 🔄 @ 21:12:05UTC : ✅ Workflow completed successfully. |
Contributor
GiGL Automation@ 21:05:00UTC : 🔄 @ 22:24:47UTC : ✅ Workflow completed successfully. |
Contributor
GiGL Automation@ 21:05:00UTC : 🔄 @ 22:31:38UTC : ✅ Workflow completed successfully. |
…r module docstring Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sses Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part of a larger change to support multiple cleints reading from the same backend.
We do this for graph store mode so that we can have the "full" storage cluster backend rpc, which is required by GLT, but we can only "read" from one rank per client (this goes along with CONTIGUOUS).
Introduce SharedDistSamplingBackend which manages a pool of worker
processes servicing multiple compute-rank channels through a fair-queued
round-robin scheduler. This replaces the per-channel producer model in
graph-store mode with a shared backend + lightweight per-channel state.
Includes tests for pure business logic helpers (_compute_num_batches,
_epoch_batch_indices, _compute_worker_seeds_ranges), shuffle behavior,
and completion reporting.