-
Notifications
You must be signed in to change notification settings - Fork 256
Description
Description
With ANSI enabled, when a cast operation fails, one sees raw CUDF exceptions rather than the appropriate Spark exception.
This problem is not exclusive to Spark 4; this behaviour also occurs on Spark 3.x, but only with ANSI enabled.
Repro
Consider the following String cast example:
Seq( "", "", "" ).toDF("a").write.mode("overwrite").parquet("/tmp/myth/test_input")
spark.read.parquet("/tmp/myth/test_input").selectExpr(" CAST(a AS INTEGER)").show
With ANSI enabled, empty strings should cause exceptions rather than yield NULL
s.
On Apache Spark 4, the exception looks like:
org.apache.spark.SparkNumberFormatException: [CAST_INVALID_INPUT] The value '' of the type "STRING" cannot be cast to "INT" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22018
== SQL (line 1, position 1) ==
cast(a as integer)
^^^^^^^^^^^^^^^^^^
at org.apache.spark.sql.errors.QueryExecutionErrors$.invalidInputInCastToNumberError(QueryExecutionErrors.scala:145)
at org.apache.spark.sql.catalyst.util.UTF8StringUtils$.withException(UTF8StringUtils.scala:51)
at org.apache.spark.sql.catalyst.util.UTF8StringUtils$.toIntExact(UTF8StringUtils.scala:34)
at org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castToInt$2(Cast.scala:801)
...
When ANSI is enabled with the spark-rapids
plugin, one sees:
com.nvidia.spark.rapids.jni.CastException: Error casting data on row 0:
at com.nvidia.spark.rapids.jni.CastStrings.toInteger(Native Method)
at com.nvidia.spark.rapids.jni.CastStrings.toInteger(CastStrings.java:50)
at com.nvidia.spark.rapids.jni.CastStrings.toInteger(CastStrings.java:37)
at com.nvidia.spark.rapids.GpuCast$.doCast(GpuCast.scala:551)
at com.nvidia.spark.rapids.GpuCast.doColumnar(GpuCast.scala:1816)
at com.nvidia.spark.rapids.GpuUnaryExpression.doItColumnar(GpuExpressions.scala:276)
Expected behavior
One would expect that the CUDF exception would be caught and handled (or wrapped into a Spark-specific exception).
Environment details
- ANSI enabled
- Spark 4, 3.x
Additional context
This is an ANSI mode test. This won't be addressed as part of #11009. It's likely to need RapidsErrorUtils
shim work.