Skip to content

[BUG] -0.0 vs 0.0 is a hot mess #294

Open
@revans2

Description

@revans2

This is related to #84 and is a super set of it.

Spark is a bit of a hot mess with support for floating point -0.0

Most SQL implementations normalize -0.0 to 0.0. Spark does this for the SQL parser, but not for the dataframe API. Also spark violates ieee spec where -0.0 != 0.0 This is because java Double.compare and Float.compare treat -0.0 as < 0.0

This is true everywhere except for a few cases. equi-joins and hash aggregate keys. Hive does not do these. It always assumes that they are different.

For cudf it follows ieee where they always end up being the same. This causes issues in both sort, comparison operators, and joins that are not equijoins.

I will file something against spark, but I don't have high hopes that anything will be fixed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Not required for releasebugSomething isn't workingcudf_dependencyAn issue or PR with this label depends on a new feature in cudf

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions