[WIP] Track visited files and directories when collecting #4203

wjt · 2018-10-19T10:44:00Z

There are two parts to the fix:

Don't visit directories which have already been visited. This fixes the exponential aspect of the bug.
Don't visit files which have already been visited. This fixes the more minor problem that, without this additional change, test_noop would be run twice, once as test_noop.py and once as symlink-0/test_noop.py. I decided against this because without more refactoring, it broke --keep-duplicates.

Fixes #624.

I also included a fix for tox.ini that makes the instructions in CONTRIBUTING.rst for how to run just a single test work; happy to split that to a separate branch if preferred.

wjt · 2018-10-19T10:47:34Z

This may seem a bit far-fetched but I did actually hit this in practice (albeit while writing a test for exactly the same bug in a project tested with pytest gcovr/gcovr#284 but I did hit that bug in practice!).

I see that test_cmdline_python_package_symlink has some special handling to test that symlinks are supported before running the test – I guess I need the same here, for Windows' benefit. (I don't have a Windows development system at the moment, so I'm being lazy and waiting for AppVeyor to test this.)

CONTRIBUTING.rst claims the following: Or to only run tests in a particular test module on Python 3.6:: $ tox -e py36 -- testing/test_config.py But without this patch, this doesn't work: the arguments after -- are ignored and all tests are run.

This fixes trying to traverse exponentially many paths in the presence of symlink loops, and trying to run any tests discovered in that tree exponentially many times if collecting ever finishes. I wanted to also prevent visiting files more than once, but my first attempt broke --keep-duplicates. Fixes pytest-dev#624

wjt · 2018-10-19T19:51:39Z

I'm told that, on Python 2.7 on Windows, st_dev and st_ino are always 0 which probably explains why this works so poorly! I guess I'll need to make the cycle-checking conditional somehow.

blueyed · 2018-10-30T01:31:02Z

src/_pytest/main.py

@@ -558,7 +560,17 @@ def _collectfile(self, path):
                return ()
        return ihook.pytest_collect_file(path=path, parent=self)

+    def _check_visited(self, path):
+        st = path.stat()


@boxed will not like that.. ;)

You might be interested in #4237 and the discussion at #2206.

There is also _recurse in the python plugin.

I've also added seen_dirs in #4237, and wonder if a combination of realpath and this would be better maybe?

If you think stat() (a single syscall) is costly, why would realpath() be any better? (Its implementation is, roughly, split the path on the path separator, and call os.lstat() on each component.)

Rather annoyingly, os.DirEntry (yielded by os.scandir()) includes the inode number but not the device number.

It was a joke I think. I have been trying to cut down on stat() calls because pytest has made millions of them in some rather simple test scenarios. A single stat() call here and there won't be a big deal. The problem is that there has been very many "just a single" things in pytest from 3.4 to 3.9 and some of those weren't really "single" because they were used in a loop (in a loop!).

So I don't know about this case, but you could try running my test script #2206 (comment) and see if performance is impacted significantly or not.

Yeah, I was joking.
But I also think that since we're doing realpath already for symlink-resolving it might not be necessary to do any stat anymore on top.

I think this should be rebased on features, and then maybe only small changes are required to make the test pass.

nicoddemus · 2018-10-30T11:28:39Z

I think this should target features for the same reasons as mentioned in #4237 (comment). 👍

nicoddemus · 2019-06-06T22:47:11Z

Hi @wjt,

It has been a long time since it has last seen activity, plus we have made sweeping changes on master to drop Python 2.7 and 3.4 support, so this PR has some conflicts which require attention.

In order to clear up our queue and let us focus on the active PRs, I'm closing this PR for now.

Please don't consider this a rejection of your PR, we just want to get this out of sight until you have the time to tackle this again. If you get around to work on this in the future, please don't hesitate in re-opening this!

Thanks for your work, the team definitely appreciates it!

wjt added 2 commits October 19, 2018 13:10

wjt force-pushed the issue-624 branch from 74dd288 to 393b5ed Compare October 19, 2018 12:22

blueyed reviewed Oct 30, 2018

View reviewed changes

nicoddemus changed the title ~~Track visited files and directories when collecting~~ [WIP] Track visited files and directories when collecting Nov 7, 2018

nicoddemus closed this Jun 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Track visited files and directories when collecting #4203

[WIP] Track visited files and directories when collecting #4203

wjt commented Oct 19, 2018 •

edited

Loading

wjt commented Oct 19, 2018 •

edited

Loading

wjt commented Oct 19, 2018

blueyed Oct 30, 2018

wjt Oct 30, 2018

wjt Oct 30, 2018

boxed Oct 30, 2018

blueyed Nov 7, 2018

nicoddemus commented Oct 30, 2018

nicoddemus commented Jun 6, 2019

[WIP] Track visited files and directories when collecting #4203

[WIP] Track visited files and directories when collecting #4203

Conversation

wjt commented Oct 19, 2018 • edited Loading

wjt commented Oct 19, 2018 • edited Loading

wjt commented Oct 19, 2018

blueyed Oct 30, 2018

Choose a reason for hiding this comment

wjt Oct 30, 2018

Choose a reason for hiding this comment

wjt Oct 30, 2018

Choose a reason for hiding this comment

boxed Oct 30, 2018

Choose a reason for hiding this comment

blueyed Nov 7, 2018

Choose a reason for hiding this comment

nicoddemus commented Oct 30, 2018

nicoddemus commented Jun 6, 2019

wjt commented Oct 19, 2018 •

edited

Loading

wjt commented Oct 19, 2018 •

edited

Loading