test: extract enforce_distribution / enforce_sorting tests into a dedicated crate#22117
test: extract enforce_distribution / enforce_sorting tests into a dedicated crate#22117zhuqi-lucas wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR reorganizes physical optimizer integration tests by moving the existing EnforceDistribution / EnforceSorting test suites into the datafusion-physical-optimizer crate’s tests/ directory, so future work (notably EnsureRequirements in #21976) can show cleaner diffs focused on rule behavior rather than test relocation.
Changes:
- Relocates the four large optimizer rule integration test files into
datafusion/physical-optimizer/tests/and updates their imports to use a localtests/common/test_utils.rs. - Adds
datafusion-physical-optimizerdev-dependencies needed to compile/run these integration tests in the new crate. - Adjusts
datafusion/core/tests/physical_optimizer/mod.rsto stop compiling the moved test modules and to suppress now-deadtest_utilshelpers.
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| datafusion/physical-optimizer/tests/enforce_distribution.rs | Switched to local tests/common/test_utils.rs and minor wiring tweaks after move. |
| datafusion/physical-optimizer/tests/enforce_sorting.rs | Inlines DummyStreamPartition and switches to local tests/common/test_utils.rs. |
| datafusion/physical-optimizer/tests/enforce_sorting_monotonicity.rs | Duplicates EnforceSortingTest helper and rewires imports for standalone integration test crate. |
| datafusion/physical-optimizer/tests/replace_with_order_preserving_variants.rs | Switches to local tests/common/test_utils.rs and applies file-level clippy expectation. |
| datafusion/physical-optimizer/tests/common/test_utils.rs | Adds shared optimizer test helpers for the moved integration tests. |
| datafusion/physical-optimizer/Cargo.toml | Adds dev-dependencies required by the moved integration tests. |
| datafusion/core/tests/physical_optimizer/mod.rs | Removes moved module declarations and adds temporary lint suppression for remaining helpers. |
| Cargo.lock | Updates lockfile due to new dev-dependencies. |
Comments suppressed due to low confidence (1)
datafusion/physical-optimizer/tests/enforce_sorting_monotonicity.rs:52
TransformedResultis imported but not used. This will trigger an unused-import warning and can fail CI when running clippy/rustc with-D warnings. RemoveTransformedResultfrom the import list (keepTreeNodeif it is needed fortransform_upmethod resolution).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
db4cccb to
df72ca2
Compare
df72ca2 to
dbd4bbb
Compare
|
cc @alamb — context: the original plan was to move these tests to |
…icated crate Moves the four physical-optimizer integration test files from `datafusion/core/tests/physical_optimizer/` into a new workspace crate `datafusion-physical-optimizer-tests` (publish = false) so they live adjacent to the optimizer rule crate they exercise: - enforce_distribution.rs (66 tests) - enforce_sorting.rs (60 tests) - enforce_sorting_monotonicity.rs (64 tests) - replace_with_order_preserving_variants.rs (46 tests) This sets up the review pattern apache#21976 asks for: once EnsureRequirements lands, the rebased PR's diff will show the function-name swap from `EnforceDistribution::new()` / `EnforceSorting::new()` to `EnsureRequirements::new()` on these files plus any net-new tests, instead of the current hard-to-diff mix of ported and net-new inline tests. Why a new crate --------------- The simpler "move to `physical-optimizer/tests/`" arrangement runs afoul of `dev/depcheck`: the tests use `datafusion::prelude::SessionContext`, which would require `datafusion = { workspace = true }` as a dev-dependency of `datafusion-physical-optimizer`. Because `datafusion` already runtime-depends on `datafusion-physical-optimizer`, that forms a cycle the project's depcheck (which guards crates.io publish ability) rejects. The new crate is a leaf in the dependency graph — nothing depends on it — so its dev-deps on `datafusion` and `datafusion-physical-optimizer` introduce no cycle. The crate is `publish = false`, following the existing precedent of `test-utils` and `datafusion-wasmtest`. Mechanics --------- - `git mv` preserves history; git detects the four moves as 93–98% renames. - `Cargo.toml` adds the new crate as a workspace member. - `physical-optimizer-tests/Cargo.toml` lists the dev-dependencies the tests need; the crate has no runtime dependencies. The `src/lib.rs` is intentionally empty (only doc comments) — the crate exists purely to host integration tests under `tests/`. - `core/tests/physical_optimizer/mod.rs` drops the four moved `mod` declarations and adds `dead_code` suppression for the now-partially-unused shared `test_utils`. - `test_utils` is referenced via `#[path = "../../core/tests/physical_optimizer/test_utils.rs"] mod test_utils;` so there is a single source of truth shared with the remaining `core/tests/physical_optimizer/` tests. The mod declaration carries `#[allow(dead_code, clippy::allow_attributes)]` because each test binary uses only a subset of the helpers. - `DummyStreamPartition`, previously imported via `crate::memory_limit::DummyStreamPartition`, is inlined into `enforce_sorting.rs` (20 lines). - `EnforceSortingTest` is duplicated into `enforce_sorting_monotonicity.rs` — each integration test is now its own crate root and cannot reach into a sibling integration test's `pub(crate)` items. The duplication is annotated as transient. Verification ------------ - cargo build --workspace --tests - cargo test -p datafusion-physical-optimizer-tests (66 + 60 + 64 + 46 = 236 tests passed) - cargo test -p datafusion --test core_integration physical_optimizer:: (remaining core tests still pass) - cargo clippy --workspace --tests -- -D warnings - cargo fmt --all --check - The new crate is a leaf: `cargo metadata` confirms nothing depends on `datafusion-physical-optimizer-tests`, so depcheck will not find a cycle through it.
cargo-machete CI flagged both as unused on datafusion-physical-optimizer-tests. The crate uses datafusion-functions-aggregate directly; the other two were pulled in transitively via the datafusion workspace dep.
dbd4bbb to
8e71877
Compare
How about just moving the tests to |
|
Hi @zhuqi-lucas -- I am sorry but I am really confused now. What I was trying to get at on #21976 is that I think we should reduce the risk of the rewrite by reusing the existing tests for EnforceSorting and EnforceDistribution I think the least risky way to avoid regressions is to reuse all the existing regression tests for those two passes (and make sure they pass with the new EnsureRequirements pass However upon additional review it seems like what actually happened was that you disconnected Given all these new tests I was under the (incorrect) impression that you had also deleted the existing tests for EnsureSorting and EnsureDistribution So I guess maybe we already have the tests consolidated in
Thank you for pushing this along -- I think this is one of the most complicated parts of the physical optimizer (so it is the hardest parts to change, but also one of the most important things to reduce compleixty) |
Which issue does this PR close?
Prep PR for #21976 (EnsureRequirements). Splits the test reorganization @alamb requested in his review (#21976 (review)) so the EnsureRequirements diff stays focused on the rule change.
Rationale
When EnsureRequirements lands and replaces
EnforceDistribution+EnforceSorting, we want the diff to show:EnsureRequirements::new()instead of the old rules (function-name swap on already-moved files), andToday the four big test files live in
core/tests/physical_optimizer/, while #21976 introduces a separateensure_requirements/new_tests.rsplus inline tests inensure_requirements/mod.rs. By moving the test files now, the follow-up rebase of #21976 will produce exactly the diff @alamb asked for.What this PR does
Moves four integration test files into a new workspace crate
datafusion-physical-optimizer-tests(publish = false):enforce_distribution.rsenforce_sorting.rsenforce_sorting_monotonicity.rsreplace_with_order_preserving_variants.rsGit detected the four moves as renames with 93–98% content similarity, so reviewers can use the rename view to confirm the test bodies didn't change.
Why a new crate (not
datafusion/physical-optimizer/tests/)The simpler "move to
physical-optimizer/tests/" approach was the initial plan, but it would require addingdatafusion = { workspace = true }as a dev-dependency ofdatafusion-physical-optimizer— these tests usedatafusion::prelude::SessionContext. Sincedatafusion(core) already runtime-depends ondatafusion-physical-optimizer, this creates a cycle that the project'sdev/depcheckrejects (it guardscrates.iopublish ability).datafusion-physical-optimizer-testsis a leaf in the dependency graph — nothing depends on it — so its dev-deps ondatafusionanddatafusion-physical-optimizerintroduce no cycle.cargo metadataconfirms zero incoming edges. The crate ispublish = false, following the precedent oftest-utilsanddatafusion-wasmtest.Mechanical changes
Cargo.toml(workspace): addsdatafusion/physical-optimizer-teststomembers.datafusion/physical-optimizer-tests/Cargo.toml: declares dev-dependencies the tests need. Crate has no runtime deps;src/lib.rsis intentionally empty so cargo treats the directory as a workspace member hosting integration tests undertests/.core/tests/physical_optimizer/mod.rs: drops the four movedmoddeclarations; addsdead_codesuppression fortest_utils(some helpers were only used by the moved tests).test_utils: the moved files reference the existing helper via#[path = "../../core/tests/physical_optimizer/test_utils.rs"] mod test_utils;so there's a single source of truth. Themoddeclaration carries#[allow(dead_code, clippy::allow_attributes)]because each test binary uses only a subset of the helpers.enforce_sorting.rs:DummyStreamPartition(previously imported viacrate::memory_limit::DummyStreamPartition) is inlined as a 20-line struct.enforce_sorting_monotonicity.rs:EnforceSortingTest(previously shared withenforce_sorting.rsaspub(crate)) is duplicated into this file — each integration test is now its own crate root and cannot reach into a sibling integration test'spub(crate)items. Annotated as transient.Verification
cargo metadataconfirmsdatafusion-physical-optimizer-testsis a leaf in the workspace dependency graph (no incoming edges), sodev/depcheckwill not find a cycle through it.Follow-ups
EnforceDistribution/EnforceSortingtoEnsureRequirementson the four moved files plus genuinely new tests.core/tests/physical_optimizer/integration tests alongside their rules in a follow-up. The#[path]reference tocore/tests/.../test_utils.rswill go away naturally when that happens.