Skip to content

Substrait consumer does not plan nested window expressions in projections #22629

@bvolpato

Description

@bvolpato

Describe the bug

The Substrait logical-plan consumer only creates a WindowAggr node when a ProjectRel expression is itself a WindowFunction. If a valid Substrait project expression nests a window function inside a scalar expression, the window remains inside Projection and is not physically plannable.

For example, a ProjectRel expression corresponding to:

SELECT 1 + count(*) OVER () FROM DATA;

is consumed as:

Projection: Int64(1) + count(Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS EXPR$0
  TableScan: DATA

rather than introducing a WindowAggr node.

To Reproduce

Add a Substrait fixture whose projected expression is an add:i64_i64 scalar function with a nested count:any window function, and consume it in datafusion/substrait/tests/cases/logical_plans.rs with execution enabled:

let plan = from_substrait_plan(&ctx.state(), &proto_plan).await?;
DataFrame::new(ctx.state(), plan).show().await?;

Before a fix, a regression snapshot expecting the window node fails with this difference:

 Projection: Int64(1) + count(Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS EXPR$0
-  WindowAggr: windowExpr=[[count(Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING]]
-    TableScan: DATA
+  TableScan: DATA

For non-empty executable plans this shape reaches physical planning with a nested WindowFunction in Projection, which physical planning rejects.

Expected behavior

The Substrait consumer should find window expressions recursively, yielding:

Projection: Int64(1) + count(Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS EXPR$0
  WindowAggr: windowExpr=[[count(Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING]]
    TableScan: DATA

and the resulting plan should execute.

Additional context

datafusion/sql/src/select.rs and the DataFrame APIs already call find_window_exprs(...) to collect deeply nested window expressions. datafusion/substrait/src/logical_plan/consumer/rel/project_rel.rs currently only checks if let Expr::WindowFunction(_) = &e, so Substrait consumption handles only root window expressions. I have a focused patch and executable fixture/test prepared.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions