-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[WIP] Track visited files and directories when collecting #4203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This may seem a bit far-fetched but I did actually hit this in practice (albeit while writing a test for exactly the same bug in a project tested with pytest gcovr/gcovr#284 but I did hit that bug in practice!). I see that |
CONTRIBUTING.rst claims the following: Or to only run tests in a particular test module on Python 3.6:: $ tox -e py36 -- testing/test_config.py But without this patch, this doesn't work: the arguments after -- are ignored and all tests are run.
This fixes trying to traverse exponentially many paths in the presence of symlink loops, and trying to run any tests discovered in that tree exponentially many times if collecting ever finishes. I wanted to also prevent visiting files more than once, but my first attempt broke --keep-duplicates. Fixes pytest-dev#624
I'm told that, on Python 2.7 on Windows, |
@@ -558,7 +560,17 @@ def _collectfile(self, path): | |||
return () | |||
return ihook.pytest_collect_file(path=path, parent=self) | |||
|
|||
def _check_visited(self, path): | |||
st = path.stat() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think stat()
(a single syscall) is costly, why would realpath()
be any better? (Its implementation is, roughly, split the path on the path separator, and call os.lstat()
on each component.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather annoyingly, os.DirEntry
(yielded by os.scandir()
) includes the inode number but not the device number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a joke I think. I have been trying to cut down on stat() calls because pytest has made millions of them in some rather simple test scenarios. A single stat() call here and there won't be a big deal. The problem is that there has been very many "just a single" things in pytest from 3.4 to 3.9 and some of those weren't really "single" because they were used in a loop (in a loop!).
So I don't know about this case, but you could try running my test script #2206 (comment) and see if performance is impacted significantly or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was joking.
But I also think that since we're doing realpath
already for symlink-resolving it might not be necessary to do any stat
anymore on top.
I think this should be rebased on features, and then maybe only small changes are required to make the test pass.
I think this should target |
Hi @wjt, It has been a long time since it has last seen activity, plus we have made sweeping changes on In order to clear up our queue and let us focus on the active PRs, I'm closing this PR for now. Please don't consider this a rejection of your PR, we just want to get this out of sight until you have the time to tackle this again. If you get around to work on this in the future, please don't hesitate in re-opening this! Thanks for your work, the team definitely appreciates it! |
There are two parts to the fix:
Don't visit files which have already been visited. This fixes the more minor problem that, without this additional change, test_noop would be run twice, once as test_noop.py and once as symlink-0/test_noop.py.I decided against this because without more refactoring, it broke--keep-duplicates
.Fixes #624.
I also included a fix for tox.ini that makes the instructions in CONTRIBUTING.rst for how to run just a single test work; happy to split that to a separate branch if preferred.