Skip to content

[SPARK-48091][SQL] Preserve aliases inside lambda when ExtractGenerator restructures plan#55892

Open
shrirangmhalgi wants to merge 1 commit into
apache:masterfrom
shrirangmhalgi:SPARK-48091-explode-transform-alias
Open

[SPARK-48091][SQL] Preserve aliases inside lambda when ExtractGenerator restructures plan#55892
shrirangmhalgi wants to merge 1 commit into
apache:masterfrom
shrirangmhalgi:SPARK-48091-explode-transform-alias

Conversation

@shrirangmhalgi
Copy link
Copy Markdown

What changes were proposed in this pull request?

Fix ExtractGenerator to preserve aliases inside lambda functions when restructuring the plan.

Previously, ExtractGenerator called trimNonTopLevelAliases on all expressions in the project list before extracting the generator. This stripped aliases inside lambda functions (e.g., struct(x.as("data"))) before CreateStruct could resolve them into struct field names.

The fix uses trimNonTopLevelAliases only for pattern matching (to detect generators via AliasedGenerator), but preserves the original untrimmed expression for non-generator project items.

Why are the changes needed?

When using explode together with transform in the same select statement, aliases used inside the transformed column's struct() are ignored. Field names become auto-generated (x_1, x_2) instead of the user-specified alias. This only happens with the DataFrame/Dataset API, not with SQL.

Does this PR introduce any user-facing change?

Yes. Struct field aliases inside transform lambdas are now correctly preserved when explode (or any generator) is in the same select.

How was this patch tested?

Added a test in GeneratorFunctionSuite verifying that struct field aliases are preserved when explode and transform are used together, including single and multiple aliases.

Was this patch authored or co-authored using generative AI tooling?

Yes.

…or restructures plan

ExtractGenerator called trimNonTopLevelAliases on all project list items before extracting the generator. This stripped aliases inside lambda functions (e.g., struct(x.as("data"))) before they could be resolved into struct field names by CreateStruct.

Now only uses trimNonTopLevelAliases for pattern matching to detect generators, but preserves the original untrimmed expression for non-generator project items.
@shrirangmhalgi
Copy link
Copy Markdown
Author

shrirangmhalgi commented May 15, 2026

@cloud-fan / @dongjoon-hyun / @sarutak could you please review this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant