Description
This is related to #84 and is a super set of it.
Spark is a bit of a hot mess with support for floating point -0.0
Most SQL implementations normalize -0.0
to 0.0
. Spark does this for the SQL parser, but not for the dataframe API. Also spark violates ieee spec where -0.0
!= 0.0
This is because java Double.compare
and Float.compare
treat -0.0
as < 0.0
This is true everywhere except for a few cases. equi-joins and hash aggregate keys. Hive does not do these. It always assumes that they are different.
For cudf it follows ieee where they always end up being the same. This causes issues in both sort, comparison operators, and joins that are not equijoins.
I will file something against spark, but I don't have high hopes that anything will be fixed.