Describe the bug
The Substrait logical-plan consumer only creates a WindowAggr node when a ProjectRel expression is itself a WindowFunction. If a valid Substrait project expression nests a window function inside a scalar expression, the window remains inside Projection and is not physically plannable.
For example, a ProjectRel expression corresponding to:
SELECT 1 + count(*) OVER () FROM DATA;
is consumed as:
Projection: Int64(1) + count(Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS EXPR$0
TableScan: DATA
rather than introducing a WindowAggr node.
To Reproduce
Add a Substrait fixture whose projected expression is an add:i64_i64 scalar function with a nested count:any window function, and consume it in datafusion/substrait/tests/cases/logical_plans.rs with execution enabled:
let plan = from_substrait_plan(&ctx.state(), &proto_plan).await?;
DataFrame::new(ctx.state(), plan).show().await?;
Before a fix, a regression snapshot expecting the window node fails with this difference:
Projection: Int64(1) + count(Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS EXPR$0
- WindowAggr: windowExpr=[[count(Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING]]
- TableScan: DATA
+ TableScan: DATA
For non-empty executable plans this shape reaches physical planning with a nested WindowFunction in Projection, which physical planning rejects.
Expected behavior
The Substrait consumer should find window expressions recursively, yielding:
Projection: Int64(1) + count(Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS EXPR$0
WindowAggr: windowExpr=[[count(Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING]]
TableScan: DATA
and the resulting plan should execute.
Additional context
datafusion/sql/src/select.rs and the DataFrame APIs already call find_window_exprs(...) to collect deeply nested window expressions. datafusion/substrait/src/logical_plan/consumer/rel/project_rel.rs currently only checks if let Expr::WindowFunction(_) = &e, so Substrait consumption handles only root window expressions. I have a focused patch and executable fixture/test prepared.
Describe the bug
The Substrait logical-plan consumer only creates a
WindowAggrnode when aProjectRelexpression is itself aWindowFunction. If a valid Substrait project expression nests a window function inside a scalar expression, the window remains insideProjectionand is not physically plannable.For example, a
ProjectRelexpression corresponding to:is consumed as:
rather than introducing a
WindowAggrnode.To Reproduce
Add a Substrait fixture whose projected expression is an
add:i64_i64scalar function with a nestedcount:anywindow function, and consume it indatafusion/substrait/tests/cases/logical_plans.rswith execution enabled:Before a fix, a regression snapshot expecting the window node fails with this difference:
For non-empty executable plans this shape reaches physical planning with a nested
WindowFunctioninProjection, which physical planning rejects.Expected behavior
The Substrait consumer should find window expressions recursively, yielding:
and the resulting plan should execute.
Additional context
datafusion/sql/src/select.rsand the DataFrame APIs already callfind_window_exprs(...)to collect deeply nested window expressions.datafusion/substrait/src/logical_plan/consumer/rel/project_rel.rscurrently only checksif let Expr::WindowFunction(_) = &e, so Substrait consumption handles only root window expressions. I have a focused patch and executable fixture/test prepared.