Skip to content

Allow compressing hashes into a single UInt or Vector{UInt} #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kernelmethod opened this issue Jan 15, 2020 · 0 comments
Closed

Allow compressing hashes into a single UInt or Vector{UInt} #7

kernelmethod opened this issue Jan 15, 2020 · 0 comments
Labels
enhancement New feature or request

Comments

@kernelmethod
Copy link
Owner

Problem

The hashes returned by most hash functions tend to use a lot of memory. For instance, a length-0 Vector{Int64} (e.g. as returned by LpHash) is 40 bytes:

julia> Base.summarysize(Vector{Int64}(undef, 0))
40

Moreover, using these hashes as a key into a database or hash table is difficult since in general they may not understand the datatype being used for the key.

Proposed solution

The solution I'm proposing is to add a function compress_hash that accepts a Vector{<:Integer} or BitArray{1} and converts it into a UInt32, UInt64, or Vector{UInt8}.

  • For instance, we could use Julia's built-in hash function and simple let compress_hash(x) = hash(x), which returns UInt64.
  • Alternatively, we could reinterpret x as an Array{UInt8} and use sha256(x), which returns Vector{UInt8}.

Notes

  • It's worth considering whether or not compress_hash needs to be cryptographically secure. I suspect that it should be in order to be on the safe side for various potential applications of this package. In that case, we will need to define a type such as
struct HashCompressor
    salt :: Vector{UInt8}
end

(hashfn::HashCompressor)(x::Vector{UInt8}) = hcat(hashfn.salt, x) |> sha256
  • Adding on to the last bullet point: it may be worth looking at the new BLAKE3 as a fast alternative to sha256, though it's unlikely that we'll be hashing anything large enough to justify going to great lengths in order to do this.
@kernelmethod kernelmethod added the enhancement New feature or request label Jan 15, 2020
@kernelmethod kernelmethod changed the title Allow compressing hashes into a single UInt Allow compressing hashes into a single UInt or Vector{UInt} Jan 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant