Skip to content

gh-111545: Add PyHash_Double() function #112095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions Doc/c-api/hash.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,16 @@ PyHash API

See also the :c:member:`PyTypeObject.tp_hash` member.

Types
^^^^^

.. c:type:: Py_hash_t

Hash value type: signed integer.

.. versionadded:: 3.2


.. c:type:: Py_uhash_t

Hash value type: unsigned integer.
Expand Down Expand Up @@ -41,8 +45,28 @@ See also the :c:member:`PyTypeObject.tp_hash` member.
.. versionadded:: 3.4


Functions
^^^^^^^^^

.. c:function:: Py_hash_t PyHash_Double(double value, PyObject *obj)

Hash a C double number.

If *value* is not-a-number (NaN):

* If *obj* is not ``NULL``, return the hash of the *obj* pointer.
* Otherwise, return :data:`sys.hash_info.nan <sys.hash_info>` (``0``).

The function cannot fail: it cannot return ``-1``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we do want users to check the result, so that the function can start failing in some cases in the future.

Suggested change
The function cannot fail: it cannot return ``-1``.
On failure, the function returns ``-1`` and sets an exception.
(``-1`` is not a valid hash value; it is only returned on failure.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the point of asking developers to make their code slower for a case which cannot happen. It would make C extensions slower for no reason, no?

PyObject_Hash(obj) can call arbitrary __hash__() method in Python and so can fail. But PyHash_Double() is simple and cannot fail. It's just that it has the same API than PyObject_Hash() and PyTypeObject.tp_hash for convenience.

For me, it's the same as PyType_CheckExact(obj): the function cannot fail. Do you want to suggest users to start checking for -1 because the API is that it may set an exception and return -1? IMO practicability beats purity here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am strongly for allowing deprecation via runtime warnings, and for keeping new API consistent in that respect.

If the speed is an issue (which I doubt, with branch prediction around), let's solve that in a way that still allows the API to report errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am strongly for allowing deprecation via runtime warnings, and for keeping new API consistent in that respect.

I created capi-workgroup/api-evolution#43 to discuss functions which cannot fail: when the caller is not expected to check for errors.

If the speed is an issue (which I doubt, with branch prediction around), let's solve that in a way that still allows the API to report errors.

Would you mind to elaborate how you plan to solve this issue?

My concern is more about usability of the API than performance here.

But yeah, performance matters as well. Such function can be used in a hash table (when floats as used as key), and making such function as fast as possible matters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind to elaborate how you plan to solve this issue?

It's possible for specific compilers: add a static inline wrapper with if (result == -1) __builtin_unreachable(); or __assume(result != -1).
That way the compiler can optimize error checking away, until a later Python version decides to allow failures.


.. versionadded:: 3.13


.. c:function:: PyHash_FuncDef* PyHash_GetFuncDef(void)

Get the hash function definition.

.. seealso::
:pep:`456` "Secure and interchangeable hash algorithm".

.. versionadded:: 3.4
8 changes: 7 additions & 1 deletion Doc/library/sys.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1034,7 +1034,13 @@ always available.

.. attribute:: hash_info.nan

(This attribute is no longer used)
The hash value returned for not-a-number (NaN).

This hash value is only used by the :c:func:`PyHash_Double` C function
when the *obj* argument is ``NULL``.

.. versionchanged:: 3.10
This hash value is no longer used to hash numbers in Python.

.. attribute:: hash_info.imag

Expand Down
3 changes: 3 additions & 0 deletions Doc/whatsnew/3.13.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1181,6 +1181,9 @@ New Features
:exc:`KeyError` if the key missing.
(Contributed by Stefan Behnel and Victor Stinner in :gh:`111262`.)

* Add :c:func:`PyHash_Double` function to hash a C double number.
(Contributed by Victor Stinner in :gh:`111545`.)


Porting to Python 3.13
----------------------
Expand Down
2 changes: 2 additions & 0 deletions Include/cpython/pyhash.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,5 @@ typedef struct {
} PyHash_FuncDef;

PyAPI_FUNC(PyHash_FuncDef*) PyHash_GetFuncDef(void);

PyAPI_FUNC(Py_hash_t) PyHash_Double(double value, PyObject *obj);
1 change: 1 addition & 0 deletions Include/internal/pycore_pyhash.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ PyAPI_FUNC(Py_hash_t) _Py_HashBytes(const void*, Py_ssize_t);

#define _PyHASH_MODULUS (((size_t)1 << _PyHASH_BITS) - 1)
#define _PyHASH_INF 314159
#define _PyHASH_NAN 0
#define _PyHASH_IMAG _PyHASH_MULTIPLIER

/* Hash secret
Expand Down
51 changes: 51 additions & 0 deletions Lib/test/test_capi/test_hash.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
import math
import sys
import unittest
from test.support import import_helper
_testcapi = import_helper.import_module('_testcapi')


NULL = None
SIZEOF_PY_HASH_T = _testcapi.SIZEOF_VOID_P


Expand Down Expand Up @@ -31,3 +33,52 @@ def test_hash_getfuncdef(self):
self.assertEqual(func_def.name, hash_info.algorithm)
self.assertEqual(func_def.hash_bits, hash_info.hash_bits)
self.assertEqual(func_def.seed_bits, hash_info.seed_bits)

def test_hash_double(self):
# Test PyHash_Double()
hash_double = _testcapi.hash_double
marker = object()
marker_hash = hash(marker)

# test integers
integers = [
*range(1, 30),
2**30 - 1,
2 ** 233,
int(sys.float_info.max),
]
for x in integers:
for obj in (NULL, marker):
with self.subTest(x=x, obj=obj):
self.assertEqual(hash_double(float(x), obj), hash(x))
self.assertEqual(hash_double(float(-x), obj), hash(-x))

# test positive and negataive zeros
for obj in (NULL, marker):
with self.subTest(x=x, obj=obj):
self.assertEqual(hash_double(float(0.0), obj), 0)
self.assertEqual(hash_double(float(-0.0), obj), 0)

# test +inf and -inf
inf = float("inf")
for obj in (NULL, marker):
with self.subTest(obj=obj):
self.assertEqual(hash_double(inf), sys.hash_info.inf)
self.assertEqual(hash_double(-inf), -sys.hash_info.inf)

# test not-a-number (NaN)
self.assertEqual(hash_double(float('nan'), marker), marker_hash)
self.assertEqual(hash_double(float('nan'), NULL), sys.hash_info.nan)

# special float values: compare with Python hash() function
special_values = (
math.nextafter(0.0, 1.0), # smallest positive subnormal number
sys.float_info.min, # smallest positive normal number
sys.float_info.epsilon,
sys.float_info.max, # largest positive finite number
)
for x in special_values:
for obj in (NULL, marker):
with self.subTest(x=x, obj=obj):
self.assertEqual(hash_double(x, obj), hash(x))
self.assertEqual(hash_double(-x, obj), hash(-x))
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Add :c:func:`PyHash_Double` function to hash a C double number. Patch by
Victor Stinner.
21 changes: 21 additions & 0 deletions Modules/_testcapi/hash.c
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#include "parts.h"
#include "util.h"


static PyObject *
hash_getfuncdef(PyObject *Py_UNUSED(module), PyObject *Py_UNUSED(args))
{
Expand Down Expand Up @@ -44,8 +45,28 @@ hash_getfuncdef(PyObject *Py_UNUSED(module), PyObject *Py_UNUSED(args))
return result;
}


static PyObject *
hash_double(PyObject *Py_UNUSED(module), PyObject *args)
{
double value;
PyObject *obj = NULL;
if (!PyArg_ParseTuple(args, "d|O", &value, &obj)) {
return NULL;
}
NULLABLE(obj);

Py_hash_t hash = PyHash_Double(value, obj);
assert(hash != -1);

Py_BUILD_ASSERT(sizeof(long long) >= sizeof(hash));
return PyLong_FromLongLong(hash);
}


static PyMethodDef test_methods[] = {
{"hash_getfuncdef", hash_getfuncdef, METH_NOARGS},
{"hash_double", hash_double, METH_VARARGS},
{NULL},
};

Expand Down
19 changes: 16 additions & 3 deletions Python/pyhash.c
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ static Py_ssize_t hashstats[Py_HASH_STATS_MAX + 1] = {0};
Py_hash_t _Py_HashPointer(const void *);

Py_hash_t
_Py_HashDouble(PyObject *inst, double v)
PyHash_Double(double v, PyObject *obj)
{
int e, sign;
double m;
Expand All @@ -95,8 +95,15 @@ _Py_HashDouble(PyObject *inst, double v)
if (!Py_IS_FINITE(v)) {
if (Py_IS_INFINITY(v))
return v > 0 ? _PyHASH_INF : -_PyHASH_INF;
else
return _Py_HashPointer(inst);
else {
assert(Py_IS_NAN(v));
if (obj != NULL) {
return _Py_HashPointer(obj);
}
else {
return _PyHASH_NAN;
}
}
}

m = frexp(v, &e);
Expand Down Expand Up @@ -131,6 +138,12 @@ _Py_HashDouble(PyObject *inst, double v)
return (Py_hash_t)x;
}

Py_hash_t
_Py_HashDouble(PyObject *obj, double v)
{
return PyHash_Double(v, obj);
}

Py_hash_t
_Py_HashPointerRaw(const void *p)
{
Expand Down
2 changes: 1 addition & 1 deletion Python/sysmodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -1497,7 +1497,7 @@ get_hash_info(PyThreadState *tstate)
PyStructSequence_SET_ITEM(hash_info, field++,
PyLong_FromLong(_PyHASH_INF));
PyStructSequence_SET_ITEM(hash_info, field++,
PyLong_FromLong(0)); // This is no longer used
PyLong_FromLong(_PyHASH_NAN));
PyStructSequence_SET_ITEM(hash_info, field++,
PyLong_FromLong(_PyHASH_IMAG));
PyStructSequence_SET_ITEM(hash_info, field++,
Expand Down