Skip to content

Preserve the name of grouping sets in SimplifyExpressions #14888

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 26, 2025

Conversation

joroKr21
Copy link
Contributor

Whenever we use recompute_schema or with_exprs_and_inputs, this ensures that we obtain the same schema.

Which issue does this PR close?

Followup to #14734

Rationale for this change

In #14734 we introduced more aggressive constant folding which applies also to aliases. That lead to the discovery that SimplifyExpressions doesn't preserve the name of grouping sets which are relevant for the computed schema. Note that this could be an issue also before #14734 because the grouping set might have no alias and then if the expression were changed by constant folding, it would still affect the result schema.

What changes are included in this PR?

In SimplifyExpressions we check for grouping sets and then map over their children with the NamePreserver.
Otherwise, same as before.

Are these changes tested?

Added a unit test.

Are there any user-facing changes?

No user facing changes, the schema of grouping sets should be more stable after optimizations.

Whenever we use `recompute_schema` or `with_exprs_and_inputs`,
this ensures that we obtain the same schema.
@github-actions github-actions bot added the optimizer Optimizer rules label Feb 26, 2025
};

plan.map_expressions(|expr| {
// Preserve the aliasing of grouping sets.
Copy link
Contributor

@alamb alamb Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line seems like it does the opposite (doesn't preserve the original names 🤔 )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes you think so?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grouping sets needs to maintain the alias of the children expressions as the field names needs to be based on that rather than the outer expression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes you think so?

I was thinking that the code skips calling rewrite_expr (which calls name preserver) for the GroupingSets (so thus does not preserve the aliases of the Expr::Grouping itself

To be clear I think the code in PR looks good to me, I am just discussing if we can make the comment better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I meant it in the sense that Expr::GroupinSets is a container of the actual grouping sets (which are a Vec<Expr> or Vec<Vec<Expr>>) and we want to preserve their names. But I'm happy to reword it however you like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to reword if I am the only one confused :)

@alamb alamb merged commit 8d2d495 into apache:main Feb 26, 2025
24 checks passed
@alamb
Copy link
Contributor

alamb commented Feb 26, 2025

Thanks again @joroKr21

@joroKr21 joroKr21 deleted the grouping-set-alias branch February 27, 2025 06:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants