Skip to content

[WIP] Track visited files and directories when collecting #4203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions AUTHORS
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,7 @@ Virgil Dupras
Vitaly Lashmanov
Vlad Dragos
Wil Cooley
Will Thompson
William Lee
Wim Glenn
Wouter van Ackooy
Expand Down
12 changes: 12 additions & 0 deletions src/_pytest/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -392,6 +392,8 @@ def __init__(self, config):
self._initialpaths = frozenset()
# Keep track of any collected nodes in here, so we don't duplicate fixtures
self._node_cache = {}
# Keep track of visited directories in here, so we don't end up in a symlink-induced loop.
self._visited = set()

self.config.pluginmanager.register(self, name="session")

Expand Down Expand Up @@ -558,7 +560,17 @@ def _collectfile(self, path):
return ()
return ihook.pytest_collect_file(path=path, parent=self)

def _check_visited(self, path):
st = path.stat()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@boxed will not like that.. ;)

You might be interested in #4237 and the discussion at #2206.

There is also _recurse in the python plugin.

I've also added seen_dirs in #4237, and wonder if a combination of realpath and this would be better maybe?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think stat() (a single syscall) is costly, why would realpath() be any better? (Its implementation is, roughly, split the path on the path separator, and call os.lstat() on each component.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather annoyingly, os.DirEntry (yielded by os.scandir()) includes the inode number but not the device number.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a joke I think. I have been trying to cut down on stat() calls because pytest has made millions of them in some rather simple test scenarios. A single stat() call here and there won't be a big deal. The problem is that there has been very many "just a single" things in pytest from 3.4 to 3.9 and some of those weren't really "single" because they were used in a loop (in a loop!).

So I don't know about this case, but you could try running my test script #2206 (comment) and see if performance is impacted significantly or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was joking.
But I also think that since we're doing realpath already for symlink-resolving it might not be necessary to do any stat anymore on top.

I think this should be rebased on features, and then maybe only small changes are required to make the test pass.

key = (st.dev, st.ino)
if key in self._visited:
return True
self._visited.add(key)
return False

def _recurse(self, path):
if self._check_visited(path):
return False
ihook = self.gethookproxy(path.dirpath())
if ihook.pytest_ignore_collect(path=path, config=self.config):
return
Expand Down
56 changes: 56 additions & 0 deletions testing/examples/test_issue624.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# -*- coding: utf-8 -*-
from __future__ import absolute_import, division, print_function
import sys

import six

import py
import pytest


@pytest.mark.skipif(
not hasattr(py.path.local, "mksymlinkto"),
reason="symlink not available on this platform",
)
def test_624(testdir):
"""
Runs tests in the following directory tree:

testdir/
test_noop.py
symlink-0 -> .
symlink-1 -> .

On Linux, the maximum number of symlinks in a path is 40, after which ELOOP
is returned when trying to read the path. This means that if we walk the
directory tree naively, following symlinks, naively, this will attempt to
visit test_noop.py via 2 ** 41 paths:

testdir/symlink-0/test_noop.py
testdir/symlink-1/test_noop.py
testdir/symlink-0/symlink-0/test_noop.py
testdir/symlink-0/symlink-1/test_noop.py
.. and eventually ..
testdir/symlink-0/.. 2 ** 39 more combinations ../test_noop.py
testdir/symlink-1/.. 2 ** 39 more combinations ../test_noop.py

Instead, we should stop recursing when we reach a directory we've seen
before. In this test, this means visiting the test once at the root, and
once via a symlink, then stopping.
"""

test_noop_py = testdir.makepyfile(test_noop="def test_noop():\n pass")

# dummy check that we can actually create symlinks: on Windows `py.path.mksymlinkto` is
# available, but normal users require special admin privileges to create symlinks.
if sys.platform == "win32":
try:
(testdir.tmpdir / ".dummy").mksymlinkto(test_noop_py)
except OSError as e:
pytest.skip(six.text_type(e.args[0]))

for i in range(2):
(testdir.tmpdir / "symlink-{}".format(i)).mksymlinkto(testdir.tmpdir)

result = testdir.runpytest()
result.assert_outcomes(passed=2)
6 changes: 3 additions & 3 deletions tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ envlist =

[testenv]
commands =
{env:_PYTEST_TOX_COVERAGE_RUN:} pytest --lsof
{env:_PYTEST_TOX_COVERAGE_RUN:} pytest --lsof {posargs}
coverage: coverage combine
coverage: coverage report
passenv = USER USERNAME COVERAGE_* TRAVIS
Expand All @@ -41,7 +41,7 @@ deps =
py27: mock
nose
commands =
pytest -n auto --runpytest=subprocess
pytest -n auto --runpytest=subprocess {posargs}


[testenv:linting]
Expand All @@ -58,7 +58,7 @@ deps =
hypothesis>=3.56
{env:_PYTEST_TOX_EXTRA_DEP:}
commands =
{env:_PYTEST_TOX_COVERAGE_RUN:} pytest -n auto
{env:_PYTEST_TOX_COVERAGE_RUN:} pytest -n auto {posargs}

[testenv:py36-xdist]
# NOTE: copied from above due to https://github.com/tox-dev/tox/issues/706.
Expand Down