Skip to content

Introduce caching in the per-group metrics #19

@tmke8

Description

@tmke8

We can maintain a cache in a ClassVar in the base class, I think.

We'll use numpy arrays as dictionary keys:

from hashlib import sha256

import numpy as np
from numpy.typing import NDArray


class HashableArray[D: np.generic]:
    """A hashable wrapper for numpy arrays."""

    def __init__(self, array: NDArray[D]):
        # We copy the array because numpy arrays are mutable
        # and we cannot control their mutability outside this class.
        self.__array = np.copy(array, order="C")

    def __hash__(self) -> int:
        return int(sha256(self.__array.view(np.uint8)).hexdigest(), 16)

    def __eq__(self, other: object) -> bool:
        if isinstance(other, HashableArray):
            return np.array_equal(self.__array, other.__array)
        return False

    def unwrap(self) -> NDArray[D]:
        """Return the original numpy array.

        This method returns a copy of the original array to ensure that
        the original data is not modified outside of this class.
        """
        return self.__array.copy()

For the original array -- the one we use as key -- it makes sense to me that we copy it, but when checking whether a new input is already in the cache, it is a bit annoying that we have to copy the array again. Maybe I should make it configurable whether the array is copied in the __init__.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions