[BUG] -0.0 vs 0.0 is a hot mess

This is related to #84 and is a super set of it.

Spark is a bit of a hot mess with support for floating point `-0.0`

Most SQL implementations normalize `-0.0` to `0.0`.  Spark does this for the SQL parser, but not for the dataframe API.  Also spark violates ieee spec where `-0.0` != `0.0`  This is because java `Double.compare` and `Float.compare` treat `-0.0` as < `0.0`

This is true everywhere except for a few cases. equi-joins and hash aggregate keys. Hive does not do these.  It always assumes that they are different.

For cudf it follows ieee where they always end up being the same.  This causes issues in both sort, comparison operators, and joins that are not equijoins.

I will file something against spark, but I don't have high hopes that anything will be fixed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] -0.0 vs 0.0 is a hot mess #294

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] -0.0 vs 0.0 is a hot mess #294

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions