Skip to content

Commit fb5d83b

Browse files
stelyuscloud-fan
authored andcommitted
[SPARK-37290][SQL] - Exponential planning time in case of non-deterministic function
### What changes were proposed in this pull request? When using non-deterministic function, the method getAllValidConstraints can throw an OOM ``` protected def getAllValidConstraints(projectList: Seq[NamedExpression]): ExpressionSet = { var allConstraints = child.constraints projectList.foreach { case a Alias(l: Literal, _) => allConstraints += EqualNullSafe(a.toAttribute, l) case a Alias(e, _) => // For every alias in `projectList`, replace the reference in constraints by its attribute. allConstraints ++= allConstraints.map(_ transform { case expr: Expression if expr.semanticEquals(e) => a.toAttribute }) allConstraints += EqualNullSafe(e, a.toAttribute) case _ => // Don't change. } allConstraints } ``` In particular, this line `allConstraints ++= allConstraints.map(...)` can generate an exponential number of expressions This is because non deterministic functions are considered unique in a ExpressionSet Therefore, the number of non-deterministic expressions double every time we go through this line We can filter and keep only deterministic expression because 1 - the `semanticEquals` automatically discard non deterministic expressions 2 - this method is only used in one code path, and we keep only determinic expressions ``` lazy val constraints: ExpressionSet = { if (conf.constraintPropagationEnabled) { validConstraints .union(inferAdditionalConstraints(validConstraints)) .union(constructIsNotNullConstraints(validConstraints, output)) .filter { c => c.references.nonEmpty && c.references.subsetOf(outputSet) && c.deterministic } } else { ExpressionSet() } } ``` ### Why are the changes needed? It can lead to an exponential number of expressions and / or OOM ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Local test Closes #35233 from Stelyus/SPARK-37290. Authored-by: Franck Thang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 881f562) Signed-off-by: Wenchen Fan <[email protected]>
1 parent 7af0544 commit fb5d83b

File tree

1 file changed

+1
-1
lines changed
  • sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical

1 file changed

+1
-1
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ trait UnaryNode extends LogicalPlan with UnaryLike[LogicalPlan] {
182182
projectList.foreach {
183183
case a @ Alias(l: Literal, _) =>
184184
allConstraints += EqualNullSafe(a.toAttribute, l)
185-
case a @ Alias(e, _) =>
185+
case a @ Alias(e, _) if e.deterministic =>
186186
// For every alias in `projectList`, replace the reference in constraints by its attribute.
187187
allConstraints ++= allConstraints.map(_ transform {
188188
case expr: Expression if expr.semanticEquals(e) =>

0 commit comments

Comments
 (0)