feat: import user-defined physical optimizer rules over FFI#1557
Open
timsaucer wants to merge 10 commits into
Open
feat: import user-defined physical optimizer rules over FFI#1557timsaucer wants to merge 10 commits into
timsaucer wants to merge 10 commits into
Conversation
Expose `SessionContext.add_optimizer_rule` and `SessionContext.add_analyzer_rule` symmetric with the existing `remove_optimizer_rule`. Each accepts a Python subclass of the new `datafusion.optimizer.OptimizerRule` / `AnalyzerRule` ABCs. Implementation: * New `crates/core/src/optimizer_rules.rs` wraps user Python instances in `PyOptimizerRuleAdapter` / `PyAnalyzerRuleAdapter`, which implement the upstream `OptimizerRule` / `AnalyzerRule` traits. * `OptimizerRule.rewrite(plan)` returns `None` for "no change" or a new `LogicalPlan`. The adapter maps that to `Transformed::no` / `Transformed::yes` so the upstream optimizer's fixed-point loop terminates correctly. * `AnalyzerRule.analyze(plan)` must always return a `LogicalPlan`; returning `None` surfaces a `DataFusionError::Execution` naming the offending rule. * The upstream `&dyn OptimizerConfig` / `&ConfigOptions` arguments are not surfaced to Python in this MVP; rules that need configuration should capture it at construction time (for example by holding a `SessionContext` reference) or be implemented in Rust. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the Python-defined OptimizerRule/AnalyzerRule approach with FFI-imported physical optimizer rules. The Python logical-rule approach could observe plans but not transform them: there are no Python constructors for LogicalPlan node variants, so a rule could only return None or the input plan unchanged. The audience for custom rules also overlaps strongly with people who can write Rust. DataFusion exposes no FFI bridge for the logical OptimizerRule/AnalyzerRule traits, but it does export FFI_PhysicalOptimizerRule for the physical PhysicalOptimizerRule trait. This commit imports those instead. Changes: * Remove crates/core/src/optimizer_rules.rs, python/datafusion/optimizer.py, python/tests/test_optimizer.py, and the SessionContext.add_optimizer_rule / add_analyzer_rule methods. remove_optimizer_rule is unchanged (pre-existing). * New crates/core/src/physical_optimizer.rs reads a __datafusion_physical_optimizer_rule__ capsule and converts it via Arc<dyn PhysicalOptimizerRule>::from(&FFI_PhysicalOptimizerRule). * SessionContext gains a physical_optimizer_rules constructor argument. Upstream offers no API to add physical rules to a live context, so they are appended to the builder at construction time only. * The datafusion-ffi-example crate gains MyPhysicalOptimizerRule, a counter-backed rule used by _test_physical_optimizer_rule.py to prove the rule fires over FFI during physical planning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the `list[Any]` hint on the SessionContext `physical_optimizer_rules` argument with a `PhysicalOptimizerRuleExportable` Protocol, matching the existing `TableProviderExportable` / `*Exportable` pattern used for other FFI-capsule objects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…string Point the `physical_optimizer_rules` argument docs at the new `PhysicalOptimizerRuleExportable` Protocol instead of describing the duck type inline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The PyCapsule / FFI_PhysicalOptimizerRule mechanics describe the Protocol, not the SessionContext constructor. Move that detail onto PhysicalOptimizerRuleExportable and leave the constructor argument docs focused on behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove the explanatory comment about FFI bridge availability; the same information already lives on PhysicalOptimizerRuleExportable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sibling FFI-import modules (udf, udaf, catalog, table) carry no module-level docs, and the rst-style markup did not match Rust conventions. The function doc comment already states intent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the hand-written crates/core/src/physical_optimizer.rs with a `from_pycapsule!` invocation in the util crate, matching `physical_codec_from_pycapsule` and the other FFI capsule importers. The macro already handles the hasattr/getattr/cast/validate/pointer_checked sequence and the infallible `Arc::from(&FFI)` conversion, so the dedicated module is no longer needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the sentence about logical-rule FFI availability; it is background, not type-hint information, and keeps the Protocol docstring in line with the other *Exportable hints. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Related to #1533.
Rationale for this change
The earlier version of this PR exposed Python-defined
OptimizerRuleandAnalyzerRulesubclasses. That approach could observe logical plans but not meaningfully transform them: there are no Python constructors forLogicalPlannode variants, so a rule could only returnNoneor the input plan unchanged. The audience for custom planning rules also overlaps strongly with people who can write Rust.DataFusion exposes no FFI bridge for the logical
OptimizerRule/AnalyzerRuletraits, but it does exportFFI_PhysicalOptimizerRulefor the physicalPhysicalOptimizerRuletrait. Importing those gives full native plan/config access, no GIL, and the ability to actually rewrite plans — matching the existing FFI table-provider / UDF pattern already used in this project.What changes are included in this PR?
crates/core/src/optimizer_rules.rs,python/datafusion/optimizer.py,python/tests/test_optimizer.py, and theSessionContext.add_optimizer_rule/add_analyzer_rulemethods.remove_optimizer_ruleis unchanged (pre-existing).crates/core/src/physical_optimizer.rs, which reads a__datafusion_physical_optimizer_rule__capsule and converts it viaArc<dyn PhysicalOptimizerRule>::from(&FFI_PhysicalOptimizerRule).SessionContextgains aphysical_optimizer_rulesconstructor argument, typed with a newPhysicalOptimizerRuleExportableProtocol that matches the existingTableProviderExportable/*Exportablepattern. Upstream offers no API to add physical rules to a live context, so rules are appended to the session-state builder at construction time only.datafusion-ffi-examplecrate gainsMyPhysicalOptimizerRule, a counter-backed rule, plus_test_physical_optimizer_rule.pyproving the rule fires over FFI during physical planning.Are there any user-facing changes?
New
physical_optimizer_ruleskeyword argument onSessionContext. Additive, non-breaking. Theadd_optimizer_rule/add_analyzer_rulemethods introduced earlier on this branch are removed, but they were never released.