fix: preserve null_aware on logical JoinNode proto round-trip#22104
fix: preserve null_aware on logical JoinNode proto round-trip#22104mithuncy wants to merge 8 commits into
Conversation
…#22065) Add `null_aware` to `JoinNode` (tag 9) and plumb it through the logical encoder/decoder. Without this, NOT IN semantics silently degrade to plain LeftAnti after `to_proto` -> `from_proto`.
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
neilconway
left a comment
There was a problem hiding this comment.
Overall this is a well-done PR! Nice work, @mithuncy
It does make me a bit concerned that an issue like this surfaced. It would be great if we had a systematic way to test that query plans roundtrip through proto, like we do for substrait (datafusion/sqllogictest/src/engines/datafusion_substrait_roundtrip_engine). That could be worth taking on as a follow-up.
Per @neilconway's review: rather than building the Join through LogicalPlanBuilder (which hardcodes null_aware = false) and then overwriting the field, construct it directly via Join::try_new. Also reads null_equality from the proto, fixing the same shape of bug on the same decoder path. Drops the unused JoinConstraint import and adds a length-match check on left/right join keys (previously enforced by the builder).
Collapses a few unnecessarily-wrapped lines flagged by `cargo fmt --check` on CI.
Verified to fail on the un-patched decoder and pass with the fix.
|
|
||
| // Regression test for null_equality round-trip (related to #22065). | ||
| #[tokio::test] | ||
| async fn roundtrip_join_null_equality() -> Result<()> { |
There was a problem hiding this comment.
I don't fully undersand why both tests are needed here
There was a problem hiding this comment.
The first test exercises null_aware = true (via NOT IN SQL); the second forces null_equality = NullEqualsNull. They both use the round-trip Debug compare, but each puts one field into a non-default state, a decoder that drops the field to its default. The first test uses the default null_equality (NullEqualsNothing), so a regression of that exact shape on null_equality would slip past it.
Happy to consolidate into a single test that exercises both non-default fields on the same Join if you'd prefer — just wanted to flag the coverage tradeoff before deleting either one
There was a problem hiding this comment.
maybe worth a quick comment to highight the differecne
There was a problem hiding this comment.
Thanks each test comment now spells out the non-default field and the one held at default.
Per @alamb's review: each test comment now spells out which field is at a non-default value and which is at the default, so the coverage split is obvious without cross-referencing.
Summary
Closes #22065.
null_awarewas missing fromJoinNodein the logical proto(it was added to the physical
HashJoinExecNodein #19635). Theencoder dropped it via
..destructuring and the decoder had nofield to restore it from, so any
to_proto->from_protoroundtrip silently downgraded a null-aware LeftAnti (NOT IN semantics)
to a plain LeftAnti and returned wrong rows.
Changes
bool null_aware = 9;toJoinNode(backward compatible).null_awarethrough; decoder switches toJoin::try_newso the field is set at construction (also fixesnull_equality, same bug on the same path).Test plan
cargo test -p datafusion-proto --test proto_integration cases::roundtrip_logical_planpasses (incl. newroundtrip_join_null_aware).