Skip to content

feat: import user-defined physical optimizer rules over FFI#1557

Open
timsaucer wants to merge 10 commits into
apache:mainfrom
timsaucer:feat/df54-followups-wave4
Open

feat: import user-defined physical optimizer rules over FFI#1557
timsaucer wants to merge 10 commits into
apache:mainfrom
timsaucer:feat/df54-followups-wave4

Conversation

@timsaucer
Copy link
Copy Markdown
Member

@timsaucer timsaucer commented May 22, 2026

Which issue does this PR close?

Related to #1533.

Rationale for this change

The earlier version of this PR exposed Python-defined OptimizerRule and AnalyzerRule subclasses. That approach could observe logical plans but not meaningfully transform them: there are no Python constructors for LogicalPlan node variants, so a rule could only return None or the input plan unchanged. The audience for custom planning rules also overlaps strongly with people who can write Rust.

DataFusion exposes no FFI bridge for the logical OptimizerRule / AnalyzerRule traits, but it does export FFI_PhysicalOptimizerRule for the physical PhysicalOptimizerRule trait. Importing those gives full native plan/config access, no GIL, and the ability to actually rewrite plans — matching the existing FFI table-provider / UDF pattern already used in this project.

What changes are included in this PR?

  • Removed the Python logical-rule approach: crates/core/src/optimizer_rules.rs, python/datafusion/optimizer.py, python/tests/test_optimizer.py, and the SessionContext.add_optimizer_rule / add_analyzer_rule methods. remove_optimizer_rule is unchanged (pre-existing).
  • Added crates/core/src/physical_optimizer.rs, which reads a __datafusion_physical_optimizer_rule__ capsule and converts it via Arc<dyn PhysicalOptimizerRule>::from(&FFI_PhysicalOptimizerRule).
  • SessionContext gains a physical_optimizer_rules constructor argument, typed with a new PhysicalOptimizerRuleExportable Protocol that matches the existing TableProviderExportable / *Exportable pattern. Upstream offers no API to add physical rules to a live context, so rules are appended to the session-state builder at construction time only.
  • The datafusion-ffi-example crate gains MyPhysicalOptimizerRule, a counter-backed rule, plus _test_physical_optimizer_rule.py proving the rule fires over FFI during physical planning.

Are there any user-facing changes?

New physical_optimizer_rules keyword argument on SessionContext. Additive, non-breaking. The add_optimizer_rule / add_analyzer_rule methods introduced earlier on this branch are removed, but they were never released.

timsaucer and others added 3 commits May 21, 2026 15:28
Expose `SessionContext.add_optimizer_rule` and
`SessionContext.add_analyzer_rule` symmetric with the existing
`remove_optimizer_rule`. Each accepts a Python subclass of the new
`datafusion.optimizer.OptimizerRule` / `AnalyzerRule` ABCs.

Implementation:

* New `crates/core/src/optimizer_rules.rs` wraps user Python instances
  in `PyOptimizerRuleAdapter` / `PyAnalyzerRuleAdapter`, which
  implement the upstream `OptimizerRule` / `AnalyzerRule` traits.
* `OptimizerRule.rewrite(plan)` returns `None` for "no change" or a
  new `LogicalPlan`. The adapter maps that to
  `Transformed::no` / `Transformed::yes` so the upstream optimizer's
  fixed-point loop terminates correctly.
* `AnalyzerRule.analyze(plan)` must always return a `LogicalPlan`;
  returning `None` surfaces a `DataFusionError::Execution` naming the
  offending rule.
* The upstream `&dyn OptimizerConfig` / `&ConfigOptions` arguments are
  not surfaced to Python in this MVP; rules that need configuration
  should capture it at construction time (for example by holding a
  `SessionContext` reference) or be implemented in Rust.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the Python-defined OptimizerRule/AnalyzerRule approach with FFI-imported physical optimizer rules.

The Python logical-rule approach could observe plans but not transform them: there are no Python constructors for LogicalPlan node variants, so a rule could only return None or the input plan unchanged. The audience for custom rules also overlaps strongly with people who can write Rust.

DataFusion exposes no FFI bridge for the logical OptimizerRule/AnalyzerRule traits, but it does export FFI_PhysicalOptimizerRule for the physical PhysicalOptimizerRule trait. This commit imports those instead.

Changes:

* Remove crates/core/src/optimizer_rules.rs, python/datafusion/optimizer.py, python/tests/test_optimizer.py, and the SessionContext.add_optimizer_rule / add_analyzer_rule methods. remove_optimizer_rule is unchanged (pre-existing).
* New crates/core/src/physical_optimizer.rs reads a __datafusion_physical_optimizer_rule__ capsule and converts it via Arc<dyn PhysicalOptimizerRule>::from(&FFI_PhysicalOptimizerRule).
* SessionContext gains a physical_optimizer_rules constructor argument. Upstream offers no API to add physical rules to a live context, so they are appended to the builder at construction time only.
* The datafusion-ffi-example crate gains MyPhysicalOptimizerRule, a counter-backed rule used by _test_physical_optimizer_rule.py to prove the rule fires over FFI during physical planning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the `list[Any]` hint on the SessionContext `physical_optimizer_rules` argument with a `PhysicalOptimizerRuleExportable` Protocol, matching the existing `TableProviderExportable` / `*Exportable` pattern used for other FFI-capsule objects.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@timsaucer timsaucer changed the title feat: user-defined optimizer and analyzer rules feat: import user-defined physical optimizer rules over FFI May 27, 2026
timsaucer and others added 7 commits May 27, 2026 16:03
…string

Point the `physical_optimizer_rules` argument docs at the new
`PhysicalOptimizerRuleExportable` Protocol instead of describing the duck type inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The PyCapsule / FFI_PhysicalOptimizerRule mechanics describe the Protocol, not the SessionContext constructor. Move that detail onto PhysicalOptimizerRuleExportable and leave the constructor argument docs focused on behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove the explanatory comment about FFI bridge availability; the same information already lives on PhysicalOptimizerRuleExportable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sibling FFI-import modules (udf, udaf, catalog, table) carry no module-level docs, and the rst-style markup did not match Rust conventions. The function doc comment already states intent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the hand-written crates/core/src/physical_optimizer.rs with a `from_pycapsule!` invocation in the util crate, matching `physical_codec_from_pycapsule` and the other FFI capsule importers. The macro already handles the hasattr/getattr/cast/validate/pointer_checked sequence and the infallible `Arc::from(&FFI)` conversion, so the dedicated module is no longer needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the sentence about logical-rule FFI availability; it is background, not type-hint information, and keeps the Protocol docstring in line with the other *Exportable hints.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@timsaucer timsaucer marked this pull request as ready for review May 27, 2026 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant