Skip to content

Automatic cleanup of imports#1986

Merged
mhammond merged 8 commits intomhammond:mainfrom
Avasam:cleanup-imports
Mar 25, 2023
Merged

Automatic cleanup of imports#1986
mhammond merged 8 commits intomhammond:mainfrom
Avasam:cleanup-imports

Conversation

@Avasam
Copy link
Copy Markdown
Collaborator

@Avasam Avasam commented Dec 7, 2022

This is the first PR in a series concerning code cleanup. With the final goal to make basic type-checking validation of public methods possible, easing the addition of 3.7+ type annotations.

Grouped, ordered, and removed unused imports by running pycln, isort (and black).
The first two commits contain no manual change other than adding configuration files:

  • Configure isort with black profile
  • Configure pycln to ignore C wrapped modules it can't analyse
    • pycln will statically analyse modules to see if importing them has potential side-effects. It only removes an import if it can know for certain that it doesn't.

Also added a Github action job to validate imports.

It could be cleaned up further with a manual review and by moving some imports to the top of the files. But as I said, I wanted to do zero manual changes (other than any necessary ignore comments) as it obviously affects a lot of files across the board.

Edit: Also need to decide if you want to combine straight imports or not.

@kxrob
Copy link
Copy Markdown
Collaborator

kxrob commented Dec 7, 2022

Maybe its possible to split the commit into 2, to make the net effect of the pycln removed imports better visible vs the mere re-sort.
I ran the pycln --diff locally and only saw one issue on the quick:

--- original/ com\win32com\client\makepy.py
+++ fixed/ com\win32com\client\makepy.py
@@ -152,7 +152,7 @@
 class GUIProgress(SimpleProgress):
     def __init__(self, verboseLevel):
         # Import some modules we need to we can trap failure now.
-        import win32ui, pywin
+        import win32ui
 
         SimpleProgress.__init__(self, verboseLevel)
         self.dialog = None

which could be suppressed by

-        import win32ui, pywin
+        import pywin  # noqa
+        import win32ui

before.

@Avasam
Copy link
Copy Markdown
Collaborator Author

Avasam commented Dec 7, 2022

@kxrob I can do that.

Question though. How is the pywin import causing side-effects? Or is it that it may not exist at runtime? (and if that's the case, which other imports are also potentially non-existant?)

@kxrob
Copy link
Copy Markdown
Collaborator

kxrob commented Dec 7, 2022

How is the pywin import causing side-effects? Or is it that it may not exist at runtime? (and if that's the case, which other imports are also potentially non-existant?)

It seems its simply an early run / test for successful import here as the comment line says: "we can trap failure now". Special local logic. "Side effect" that the failure would happen / be handled early - before makepy run not within?
Not imported at module level because usually not needed.

Did not notice another issue on the quick. Mostly unused Python standard lib imports are removed it seems and things like win32con.

@Avasam
Copy link
Copy Markdown
Collaborator Author

Avasam commented Dec 8, 2022

I split into two commits so it should be easy to review the isort run vs the pycln run.
Added an exception for that one pywin import.

Also need to decide if you want to combine straight imports or not.

@mhammond
Copy link
Copy Markdown
Owner

mhammond commented Dec 9, 2022

This SGTM generally, thanks, and I'll get back to it soon (and thanks @kxrob for that) - I need to look a bit more into some of those tools, but I'm somewhat surprised black doesn't feature way more heavily - and the black changes that are here I don't yet understand :) Similarly, I'm also not sure the CI should show as broken if there's some problem a simple, local "-m black" can't detect - PRs/branches showing as broken for an import order "violation" doesn't seem valuable enough in this repo. Open to arguments otherwise though :)

@kxrob
Copy link
Copy Markdown
Collaborator

kxrob commented Dec 9, 2022

PRs/branches showing as broken for an import order "violation" doesn't seem valuable enough in this repo

I also think an isort check shouldn't annoy and become another dev dependency. Its maybe nice to apply it here (indifferent) when there are probably no side effects, and it can be run once in a blue moon. The CI could run it informally (1st in the steps) and ignore the error status - it seems to take near 0 seconds.

More useful and critical is the detection of unused imports (rarely new ones when bug fixing).
And even more: undefined names / unused local variables, and a few others.
Flake8, which is well established (uses pyflakes), comparatively fast and light-weight, can replace pycln --check (later) for the required checks in CI:
python -m flake8 --ignore=W,E (ignoring the pep8 checks competing with black) or at first python -m flake8 --ignore=W,E,F403,F405 (ignoring also * import issues) actually finds real bugs at high %age without adding a lot of work or becoming too annoying: Some 700 errors after the cleanups here: flake8-pywin32.txt
# noqa comments can silence the exceptions. I use(d) this for files I edited anyway.

I'm also not sure the CI should show as broken if there's some problem a simple, local "-m black" can't detect

flake8 could also do the PEP8 check similar to black in one go (without more run time I guess as it just ignores things with those options), but not sure if the options can make it compatible enough to black to save the local black run after small edits with high %age.
Anyway a new "prepush.bat/.py", "make prepush" or so could run those required or optional checks/tests locally (2 or 3 levels), to not risk a broken CI for trivial checker reasons.

Whatever checkers will run in CI, I think they should go into one job "black"+"imports" -> "checkers" or so to not create extra noise and virtual machines in the CI. They run within a few seconds anyway.

Comment thread .github/workflows/main.yml Outdated
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- run: pip install isort pycln pyopengl
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think it doesn't require pyopengl installation everytime in the CI - can be added to the ignore list when there are even many internal imports? This one little demo anyway does a * star import, so doesn't help, and the demo doesn't change in eons.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for pycln to be able to analyse it (because it does a star import). But I can also add it to pycln ignores to not have to import it in CI if the CI job stays.

@Avasam
Copy link
Copy Markdown
Collaborator Author

Avasam commented Dec 9, 2022

This SGTM generally, thanks, and I'll get back to it soon (and thanks @kxrob for that) - I need to look a bit more into some of those tools, but I'm somewhat surprised black doesn't feature way more heavily - and the black changes that are here I don't yet understand :) Similarly, I'm also not sure the CI should show as broken if there's some problem a simple, local "-m black" can't detect - PRs/branches showing as broken for an import order "violation" doesn't seem valuable enough in this repo. Open to arguments otherwise though :)

There's no black changes (I only had to run black afterward, because re/moving imports lead to extra empty lines that black formats-out).

At the very least having the job in this PR shows that the changes would pass re-running isort and pycln. (even if I remove it afterward, before merging).


I'll argue that preventing technical debt, disorganized imports and dead code (unused imports, which makes it confusing to know if they're actually used or not, ie: side-effects) from accumulating over years is a good thing.

I also think an isort check shouldn't annoy and become another dev dependency. Its maybe nice to apply it here (indifferent) when there are probably no side effects, and it can be run once in a blue moon. The CI could run it informally (1st in the steps) and ignore the error status - it seems to take near 0 seconds.

I can see how it may be a small source of frustration for a contributor to have a PR fail because they forgot to run the fixes (or don't have the tools in their IDE to format on save). But I still believe not leaving unused imports, and keeping a consistent style is important. And the failure happens pretty quickly.

However, there's also a solution to that: run formatting automatically on PR. For example, you can use pre-commit to automatically format pushes to Pull Requests. Effectively automating the formatting, not failing the PR (unless even post-formatting there's still issues) and you still get to review the formatting changes. As a bonus, those who like pre-commit hooks can install those git hooks to autoformat each commits (I don't like it, but it's optional and some people do).

So I could keep the organized import in this PR, but not run a blocking isort job in the CI for now.


To @kxrob 's point, Flake8 is also a tool that I'd highly recommend looking into (and which I have decent experience with its different popular plugins). But that can be looked into separately.

Type-checkers (mypy, pyright, pycharm, ...) and some linters (pylint, Flake8 plugins, etc.), can also warn you about unused imports and unused variables. But won't clean them up. pycln is a better dedicated tool for detecting and fixing imports. Still they are the reasons for those cleanup PRs I'm making: With the final goal of having inline type annotations (ie: def foo(a: str, b: int | None) -> int: ...). Just adding annotations could be done as-is, but I want to be able to validate those annotations. Which means first cleaning up and simplifying anything that would ease the addition of type-checkers (without having to disable too many rules).
pyright already found a few issues, but I'll tackle those separately, one step at a time :) (in fact I already have with some duplicated methods).

Whatever checkers will run in CI, I think they should go into one job "black"+"imports" -> "checkers" or so to not create extra noise and virtual machines in the CI. They run within a few seconds anyway.

I have no issues with lumping formatters in one job (although I'd call it "formatters", not "checkers", the difference being that one formats the code with autofixes, while others just lint). Offering a script to run all automated fixes and other eventual checks locally is a good recommendation.

@kxrob
Copy link
Copy Markdown
Collaborator

kxrob commented Dec 9, 2022

I'd call it "formatters", not "checkers"

In CI the formatter tool(s) also run in check mode so far - unless they would go to create magic black fixup commits to a PR or so?

pycln as far as I have seen worked similiar to the pyflakes based more widely used autoflakes from PyCQA, which in conservative default mode stays with auto-cleaning standard lib imports only conservatively.
Once a flake8 clean state is reached (thus with also the other unused imports removed with review or noqa'd a few exceptions), which solves quite a bunch of bugs, then I think the flake8 call above is a superset including all of what of pycln --check would bark?

goal of having inline type annotations ... validate those annotations

Do you really want to put type hints far and wide, inside this repo - for finding some more flaws in internal checks than the rather plain runs would see?
Or mainly to those fewer "API" functions which are typically used by / documented for pywin32 users? Like with Python standard library - in the external .pyi's for typeshed/pywin32?

@Avasam
Copy link
Copy Markdown
Collaborator Author

Avasam commented Dec 15, 2022

Or mainly to those fewer "API" functions

I only need/want annotations for public APIs. I strongly believe having them inline will lead to better maintainability and less risks of being inaccurate than having them in third-party stubs. @mhammond already mentioned reticence in having separate stubs files in pywin32 *, which is understandable (reasons like maintainability, keeping them synced with actual code, lots of extra boilerplate files, etc.) and can stay in typeshed for the foreseeable future (they can be generated, but doing so accurately will require a lot of work).

I could do a PR with just annotations right now. But I prefer solving any issue I'm finding tangentially first, reducing the complexity. As well as being able to have a minimum of automated validation for the annotations.

* #1913 (comment)

[...] but wouldn't accept something like those .pyi files in the attached PR.


for finding some more flaws in internal checks than the rather plain runs would see?

If I happen to find more issues (I'm sure I will), I'll gladly create PRs and open issues. But that's not what I'm currently aiming for.


then I think the flake8 call above is a superset including all of what of pycln --check would bark?

F401 (imported but unused) will give a lot more false-positives and would result in needing a lot of # noqa: F401 comments as it doesn't check for for potential side-effects. Although that also help making it more obvious that an import is used for its side-effects (or to test an import when frozen), so that's not necessarily a bad thing.

@Avasam
Copy link
Copy Markdown
Collaborator Author

Avasam commented Dec 27, 2022

@kxrob @mhammond I can update this PR, but it seems a decision needs to be taken first, here's the options as I understand them from the discussions:
A) Concerning CI / PR validation:

  1. Keep both code and CI changes
    or
  2. Keep the code changes, but not the CI changes/validations (to be added later)
    or
  3. Keep code changes and unused import validation, but don't enforce import order/grouping yet (preferably when non-breaking formatting can be run automatically by the CI)

B) Concerning unused import tooling
Either use pycln or autoflake


Also need to decide if you prefer combining straight imports (I saw a mix in some places in the code). ie:

import os, sys

# vs

import os
import sys

If the PR is blocked because some parts touch a vendored lib, I can split those changes off.

@vernondcole
Copy link
Copy Markdown
Collaborator

The changes I reviewed in the adodbapi part of the tree look acceptable. Nothing there that raises a flag about our inability to test it thoroughly. If I were a reviewer, I would mark it as approved.

Avasam added a commit to Avasam/pywin32 that referenced this pull request Feb 19, 2023
Avasam added a commit to Avasam/pywin32 that referenced this pull request Feb 19, 2023
@Avasam Avasam mentioned this pull request Feb 21, 2023
Avasam added a commit to Avasam/pywin32 that referenced this pull request Feb 26, 2023
@mhammond
Copy link
Copy Markdown
Owner

This seems good, thanks. I'm still not sure on whether I like the CI changes, but we can always kill them later if not.

@mhammond mhammond merged commit 4860945 into mhammond:main Mar 25, 2023
@Avasam Avasam deleted the cleanup-imports branch March 25, 2023 04:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants