Skip to content

util.py module type hints #192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 20, 2025
Merged

Conversation

laserkelvin
Copy link

@laserkelvin laserkelvin commented May 8, 2025

Related issues

Closes #191, which is part of epic #182.

Proposed changes

This PR makes (currently) purely aesthetic changes by adding type hints to function signatures contained in util.py, as well as updating docstrings to match implementations.

  • Bumps copyright year at the beginning of the module
  • Adds type annotations for safe_open_write_binary, valid_path
  • Updated type annotations and docstrings for ensure_ext, _load_toml

@laserkelvin laserkelvin requested a review from Pennycook May 8, 2025 17:10
@laserkelvin laserkelvin added the documentation Improvements or additions to documentation label May 8, 2025
@laserkelvin laserkelvin force-pushed the util-type-annotations branch from 5b26279 to 12e23ad Compare May 8, 2025 17:11
@laserkelvin
Copy link
Author

My last commit has the DCO sign-off - not sure why it's not passing?

@Pennycook Pennycook requested a review from Copilot May 9, 2025 08:03
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds type hints and updates docstrings in the util.py module for improved code clarity and consistency. Key changes include updating the copyright year, adding type annotations for safe_open_write_binary and valid_path, and refining the _load_toml return type and documentation.

Comments suppressed due to low confidence (1)

codebasin/util.py:55

  • The extra 'f' in the formatted string appears to be a mistake. It should be removed so that the error message is formatted using the variable 'exts' correctly.
raise ValueError(f"{path} does not have a valid extension: f{exts}")

@@ -59,7 +55,7 @@ def ensure_ext(path: os.PathLike[str], extensions: Iterable[str]):
raise ValueError(f"{path} does not have a valid extension: f{exts}")


def safe_open_write_binary(fname):
def safe_open_write_binary(fname: os.PathLike[str]) -> TextIOWrapper:
Copy link
Preview

Copilot AI May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type annotation 'TextIOWrapper' is inconsistent with opening a file in binary write mode. Consider changing it to a type such as 'BinaryIO' or another appropriate binary stream type.

Suggested change
def safe_open_write_binary(fname: os.PathLike[str]) -> TextIOWrapper:
def safe_open_write_binary(fname: os.PathLike[str]) -> typing.BinaryIO:

Copilot uses AI. Check for mistakes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know enough about the differences here. Could you try making the change and see if the type-hinting tools are still satisfied?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LSP seems happy with change made in 8e293ed

@Pennycook
Copy link
Contributor

My last commit has the DCO sign-off - not sure why it's not passing?

Because we don't squash on merge, all commits in the PR need the DCO sign-off.

I think the only way to add the missing sign-off retroactively is to do an interactive rebase, add the missing sign-offs, then force push.

from pathlib import Path

import jsonschema

log = logging.getLogger(__name__)


def ensure_ext(path: os.PathLike[str], extensions: Iterable[str]):
def ensure_ext(path: os.PathLike[str], extensions: Iterable[str]) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed to this, but I'm curious. Do conventions say to specify -> None in the case of no return types? I'm not sure which style guides to look at here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought there was that Python functions always return None, even if they don't have a return statement. For the developer it seems to be better to be explicit, since then you remove the ambiguity of "did we leave out the return signature, or does it really return None"?

Seems like mypy recommends this as well, even if we don't use mypy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that's good enough for me!

@@ -59,7 +55,7 @@ def ensure_ext(path: os.PathLike[str], extensions: Iterable[str]):
raise ValueError(f"{path} does not have a valid extension: f{exts}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't change this, but I think Copilot is right. There shouldn't be a second "f" here, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patched in 55af105

@@ -59,7 +55,7 @@ def ensure_ext(path: os.PathLike[str], extensions: Iterable[str]):
raise ValueError(f"{path} does not have a valid extension: f{exts}")


def safe_open_write_binary(fname):
def safe_open_write_binary(fname: os.PathLike[str]) -> TextIOWrapper:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know enough about the differences here. Could you try making the change and see if the type-hinting tools are still satisfied?

Comment on lines 72 to 75
This function ensures that the file path does not contain
potentially dangerous characters such as null bytes (`\x00`)
or carriage returns/line feeds (`\n`, `\r`). These characters
can pose security risks, particularly in file handling operations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This function ensures that the file path does not contain
potentially dangerous characters such as null bytes (`\x00`)
or carriage returns/line feeds (`\n`, `\r`). These characters
can pose security risks, particularly in file handling operations.
This function ensures that the file path does not contain
potentially dangerous characters such as null bytes (`\x00`)
or carriage returns/line feeds (`\n`, `\r`).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in c7ef2eb

Comment on lines 88 to 92
Notes
-----
- This function is useful for validating file paths before performing
file I/O operations to prevent security vulnerabilities.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove this.

Suggested change
Notes
-----
- This function is useful for validating file paths before performing
file I/O operations to prevent security vulnerabilities.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in c7ef2eb

Comment on lines +249 to +244
dict[str, Any]
The loaded TOML object, represented as a Python
dict with str key/value mappings.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thought about this before, so want to double-check you agree.

I think this was set up as -> object before because when we were loading JSON, the result was not guaranteed to be a dict. For example, a compilation database is an array of objects.

It does seem like TOML must be key-value pairs, and the documentation says that "TOML is designed to map unambiguously to a hash table". So I think this is right...?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's why I left the JSON signature as object, but updated it for the TOML. tomllib.load's signature only returns dict[str, Any], because I think by design it can't return anything else (i.e. keys have to be strings).

@Pennycook
Copy link
Contributor

Thanks, @laserkelvin. All of these changes look good to me, assuming you fix the DCO sign-off.

I'm not going to formally approve it yet, because I don't want to accidentally merge it before the next release.

Kin Long Kelvin Lee added 9 commits May 9, 2025 07:55
Function does not actually return `bool`

Signed-off-by: Kin Long Kelvin Lee <[email protected]>
Signed-off-by: Kin Long Kelvin Lee <[email protected]>
`tomllib.load` signature returns a dict; this change matches what is
ultimately returned by `tomllib.load`.

Signed-off-by: Kin Long Kelvin Lee <[email protected]>
Signed-off-by: Kin Long Kelvin Lee <[email protected]>
Signed-off-by: Kin Long Kelvin Lee <[email protected]>
@laserkelvin laserkelvin force-pushed the util-type-annotations branch from c7ef2eb to dca4807 Compare May 9, 2025 14:56
@Pennycook Pennycook merged commit f52f376 into P3HPC:main May 20, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Type hints to util.py
2 participants