Skip to content

Upgrade to DataFusion 53#1402

Open
nuno-faria wants to merge 4 commits intoapache:mainfrom
nuno-faria:datafusion_53
Open

Upgrade to DataFusion 53#1402
nuno-faria wants to merge 4 commits intoapache:mainfrom
nuno-faria:datafusion_53

Conversation

@nuno-faria
Copy link
Contributor

@nuno-faria nuno-faria commented Mar 1, 2026

Which issue does this PR close?

N/A.

Rationale for this change

Keep in sync with the next major version of DataFusion.

What changes are included in this PR?

This PR is quite large, but most changes are related to the new version of PyO3.

  • Updated datafusion, pyo3, and arrow (migrated the respective code as well).
  • Updated some tests where the behavior changed.
  • Added small DataFusion 53.0.0 section in the upgrade guide (since PyCapsuleMethods::reference is now deprecated).

Are there any user-facing changes?

Yes, the behavior has changed in a few tests (the review comments below contain more info).

Comment on lines 24 to 36
This version includes an upgraded version of `pyo3`, which changed the way to extract an FFI object.
Example:

Before:
.. code-block:: rust
let codec = unsafe { capsule.reference::<FFI_LogicalExtensionCodec>() };
Now:
.. code-block:: rust
let data: NonNull<FFI_LogicalExtensionCodec> = capsule
.pointer_checked(Some(c_str!("datafusion_logical_extension_codec")))?
.cast();
let codec = unsafe { data.as_ref() };
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


#[pyclass(name = "MySumUDF", module = "datafusion_ffi_example", subclass)]
#[pyclass(
from_py_object,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

#[pyclass(
skip_from_py_object,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These ones do not implement Clone.

};

let capsule = capsule.downcast::<PyCapsule>()?;
let capsule = capsule.cast::<PyCapsule>()?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +39 to +42
let data: NonNull<FFI_LogicalExtensionCodec> = capsule
.pointer_checked(Some(c_str!("datafusion_logical_extension_codec")))?
.cast();
let codec = unsafe { data.as_ref() };
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter_expr: Option<Py<PyAny>>,
projected_statistics: Statistics,
plan_properties: datafusion::physical_plan::PlanProperties,
plan_properties: Arc<PlanProperties>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

fn statistics(&self) -> DFResult<Statistics> {
fn partition_statistics(&self, _partition: Option<usize>) -> DFResult<Statistics> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

statistics has been removed: apache/datafusion#20319

Comment on lines +145 to +149
impl<'source> FromPyObject<'_, 'source> for PyScalarValue {
type Error = PyErr;

fn extract(value: Borrowed<'_, 'source, PyAny>) -> Result<Self, Self::Error> {
Self::from_pyarrow_bound(&value)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytest.param(
col("d").cardinality(),
pa.array([3, 4, None, None], type=pa.uint64()),
pa.array([3, 4, 0, None], type=pa.uint64()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cardinality now returns 0 for empty arrays: apache/datafusion#20533

col("d").array_distinct(),
pa.array(
[[-1, 0, 1], [5, 10, 15, 20], [], None], type=pa.list_(pa.int64())
[[-1, 1, 0], [5, 10, 15, 20], [], None], type=pa.list_(pa.int64())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Order has changed for array_distinct/list_distinct: apache/datafusion#20364

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant