Skip to content

simplify hashing #943

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jku opened this issue Jan 30, 2025 · 8 comments · Fixed by #977
Closed

simplify hashing #943

jku opened this issue Jan 30, 2025 · 8 comments · Fixed by #977
Labels
discussion Issues that require discussion

Comments

@jku
Copy link
Collaborator

jku commented Jan 30, 2025

Currently the hash module in securesystemslib supports multiple hash libraries:

  • This seems bonkers to me and I've never seen anyone use this functionality
  • It also makes type annotating the module a pain: this is pretty much the reason why securesystemslib is not yet marked as annotated (there are other unannotated modules but they are not as core the the library as hash is)

I think we can keep providing most of the API, but remove the cryptography support (from hashes module) and use only hashlib.

@jku jku added the discussion Issues that require discussion label Jan 30, 2025
@jku
Copy link
Collaborator Author

jku commented Feb 11, 2025

@devbyte1328
Copy link

I found this issue difficult to solve and would appreciate feedback on my approach.

To address the problem, I created a Python script inside the project directory, print_hash_locations.py, to find all instances where hash-related libraries were imported. I excluded the .tox folder because it contained too many occurrences, and I was unsure of its relevance.

import os

def find_files_with_word(directory, word):
    """Search for Python files in a given directory that contain a specific word, ignoring the .tox folder."""
    found_files = set()
    for root, _, files in os.walk(directory):
        if ".tox" in root:
            continue  # Skip the .tox folder
        for file in files:
            if file.endswith(".py"):  # Only check .py files
                file_path = os.path.join(root, file)
                try:
                    with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
                        for line in f:
                            if word in line:
                                found_files.add(file_path)
                                break  # Stop checking this file after the first match
                except Exception as e:
                    print(f"Could not read file {file_path}: {e}")
    
    for file_path in found_files:
        print(f"Word '{word}' found in: {file_path}")

if __name__ == "__main__":
    directory = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))  # Set directory to the parent folder
    search_word = "hash"
    
    if os.path.exists(directory) and os.path.isdir(directory):
        find_files_with_word(directory, search_word)
    else:
        print(f"Directory '{directory}' not found.")

Script Output:

Word 'hash' found in: /home/user/Projects/securesystemslib/securesystemslib/_vendor/ed25519/ed25519.py
Word 'hash' found in: /home/user/Projects/securesystemslib/securesystemslib/hash.py
Word 'hash' found in: /home/user/Projects/securesystemslib/securesystemslib/_gpg/rsa.py
Word 'hash' found in: /home/user/Projects/securesystemslib/securesystemslib/formats.py
Word 'hash' found in: /home/user/Projects/securesystemslib/securesystemslib/signer/_aws_signer.py
Word 'hash' found in: /home/user/Projects/securesystemslib/securesystemslib/signer/_vault_signer.py
Word 'hash' found in: /home/user/Projects/securesystemslib/print_hash_locations.py
Word 'hash' found in: /home/user/Projects/securesystemslib/securesystemslib/signer/_gpg_signer.py
Word 'hash' found in: /home/user/Projects/securesystemslib/securesystemslib/signer/_gcp_signer.py
Word 'hash' found in: /home/user/Projects/securesystemslib/securesystemslib/_gpg/common.py
Word 'hash' found in: /home/user/Projects/securesystemslib/securesystemslib/signer/_key.py
Word 'hash' found in: /home/user/Projects/securesystemslib/securesystemslib/signer/_utils.py
Word 'hash' found in: /home/user/Projects/securesystemslib/securesystemslib/_vendor/ed25519/test_ed25519.py
Word 'hash' found in: /home/user/Projects/securesystemslib/securesystemslib/signer/_crypto_signer.py
Word 'hash' found in: /home/user/Projects/securesystemslib/securesystemslib/signer/_azure_signer.py
Word 'hash' found in: /home/user/Projects/securesystemslib/tests/test_gpg.py
Word 'hash' found in: /home/user/Projects/securesystemslib/tests/check_public_interfaces_gpg.py
Word 'hash' found in: /home/user/Projects/securesystemslib/securesystemslib/_gpg/util.py
Word 'hash' found in: /home/user/Projects/securesystemslib/tests/test_signer.py
Word 'hash' found in: /home/user/Projects/securesystemslib/securesystemslib/_gpg/dsa.py
Word 'hash' found in: /home/user/Projects/securesystemslib/securesystemslib/signer/_hsm_signer.py
Word 'hash' found in: /home/user/Projects/securesystemslib/securesystemslib/_gpg/eddsa.py
Word 'hash' found in: /home/user/Projects/securesystemslib/securesystemslib/_gpg/functions.py
Word 'hash' found in: /home/user/Projects/securesystemslib/tests/test_hash.py
Word 'hash' found in: /home/user/Projects/securesystemslib/securesystemslib/_gpg/constants.py

After gathering these results, I went through each script and replaced every occurrence of:
cryptography.hazmat.primitives.hashes

With:
hashlib

Once I finished making these changes, I ran tox, but it produced numerous errors. I attempted to resolve some of them, but eventually, I got completely tangled in dependencies and errors.

@jku
Copy link
Collaborator Author

jku commented Mar 7, 2025

This change should be limited to the securesystemslib.hash module as shown in the branch that I linked to.

The questions are:

  • are we breaking any users by doing this -- my guess is no: who would want to switch hash libraries?
  • is there any point in the hash module after this or should we just remove it -- backwards compat is likely a good enough reason to keep it
  • are the test changes enough or should there be more test cleanup as part of this

@lukpueh
Copy link
Member

lukpueh commented Mar 17, 2025

I think we kept it, back when we removed all the other legacy code a year ago, because it wasn't obviously broken and used in a few places in tuf and in-toto (#270 (comment))

Removing it likely only means adding a few calls into hashlib in in-toto and tuf, so I'd be inclined to do so.

@jku
Copy link
Collaborator Author

jku commented Mar 17, 2025

I suppose you're not wrong.. so the actual answer to static analysis in python-tuf might include handling hashes there without securesystemslib.hash.

@lukpueh
Copy link
Member

lukpueh commented Mar 18, 2025

At least according to sourcegraph, it does not seem to be used beyond python-tuf and in-toto. I'll take a stab at replacing it there.

@jku
Copy link
Collaborator Author

jku commented Mar 18, 2025

I think there's two main uses of securesystemslib.hash:

  • handle hash names that hashlib does not (currently just the blake2b variant)
  • streaming hash, aka digest_fileobject() -- this looks complicated but only because it has the normalize_line_endings feature and because it handles both text and binary files...

so yeah, seems to not be problematic to just re-implement in python-tuf and in-toto

lukpueh added a commit to lukpueh/tuf that referenced this issue Mar 18, 2025
securesystemslib.hash is a small wrapper around hashlib, which serves
two main purposes:
* provide helper function to hash a file
* translate custom hash algorithm name "blake2b-256" to "blake2b" with
  (digest_size=32).

In preparation for the removal of securesystemslib.hash, this patch ports
above behavior to tuf and uses the builtin hashlib directly where
possible.

related secure-systems-lab/securesystemslib#943

Signed-off-by: Lukas Puehringer <[email protected]>
lukpueh added a commit to lukpueh/tuf that referenced this issue Mar 18, 2025
securesystemslib.hash is a small wrapper around hashlib, which serves
two main purposes:
* provide helper function to hash a file
* translate custom hash algorithm name "blake2b-256" to "blake2b" with
  (digest_size=32).

In preparation for the removal of securesystemslib.hash, this patch ports
above behavior to tuf and uses the builtin hashlib directly where
possible.

related secure-systems-lab/securesystemslib#943

Signed-off-by: Lukas Puehringer <[email protected]>
lukpueh added a commit to lukpueh/tuf that referenced this issue Mar 18, 2025
securesystemslib.hash is a small wrapper around hashlib, which serves
two main purposes:
* provide helper function to hash a file
* translate custom hash algorithm name "blake2b-256" to "blake2b" with
  (digest_size=32).

In preparation for the removal of securesystemslib.hash, this patch ports
above behavior to tuf and uses the builtin hashlib directly where
possible.

related secure-systems-lab/securesystemslib#943

Signed-off-by: Lukas Puehringer <[email protected]>
lukpueh added a commit to lukpueh/tuf that referenced this issue Mar 18, 2025
securesystemslib.hash is a small wrapper around hashlib, which serves
two main purposes:
* provide helper function to hash a file
* translate custom hash algorithm name "blake2b-256" to "blake2b" with
  (digest_size=32).

In preparation for the removal of securesystemslib.hash, this patch ports
above behavior to tuf and uses the builtin hashlib directly where
possible.

related secure-systems-lab/securesystemslib#943

Signed-off-by: Lukas Puehringer <[email protected]>
@lukpueh
Copy link
Member

lukpueh commented Mar 19, 2025

At least according to sourcegraph, it does not seem to be used beyond python-tuf and in-toto. I'll take a stab at replacing it there.

I realized my query above didn't include all ways of importing these symbols. I just broadened the search now and got the same result.

lukpueh added a commit to lukpueh/securesystemslib that referenced this issue Mar 19, 2025
fixes secure-systems-lab#943

* Internal use does not need the additional features (custom blake
  algorithm name support and file hashing), and was replaced by direct
  calls to hashlib.

* External users were updated to no longer require
  `securesystemslib.hash` (theupdateframework/python-tuf#2815,
  in-toto/in-toto#861)

Signed-off-by: Lukas Puehringer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Issues that require discussion
Projects
None yet
3 participants