Skip to content

Split Session._collection_node_cache to 3 mutually exclusive parts #6555

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 28, 2020

Conversation

bluetech
Copy link
Member

@bluetech bluetech commented Jan 24, 2020

Previously, this cache was used with 3 different and mutually exclusive key-type -> value-type combinations. Mypy can't properly type this. It's also quite confusing.

Split to 3 different dicts instead.

@bluetech
Copy link
Member Author

Suggestions welcome for better names than 1, 2, 3 :)

@blueyed
Copy link
Contributor

blueyed commented Jan 24, 2020

JFI: IIRC the node cache might not be necessary in the first place - that's something I think to remember from when looking into fixing collection issues, especially with regard to Package. I might be wrong though, and it might be more about fixing it (where this then certainly would help).

@nicoddemus
Copy link
Member

Can we please do that in features? Those attribute have changed names there already.

@bluetech bluetech changed the base branch from master to features January 25, 2020 12:09
@bluetech bluetech changed the title Split Session._node_cache to 3 mutually exclusive parts Split Session._collection_node_cache to 3 mutually exclusive parts Jan 25, 2020
@bluetech
Copy link
Member Author

Can we please do that in features? Those attribute have changed names there already.

Done!

JFI: IIRC the node cache might not be necessary in the first place

Yeah, hopefully this will make it easier to look at, at least. For reference, it was added in fedc785 to address issue #3358 (which is still open -- don't know if it just wasn't closed though).

Previously, this cache was used with 3 different and mutually exclusive
key-type -> value-type combinations. Mypy can't properly type this. It's
also quite confusing.

Split to 3 different dicts instead.
@blueyed
Copy link
Contributor

blueyed commented Jan 25, 2020

Thanks for the reference (this was part of #3389).
I think we could use this PR to try to get rid of it / minimize its usage.
That would also allow us for postponing the quest for finding appropriate names.. :)

@bluetech
Copy link
Member Author

TBH, I find doing refactors without type checks to be a frightening prospect, at least in a "delicate" codebase like pytest 😺. That's why I try to focus on adding types and only make changes when I have to.

@blueyed

This comment has been minimized.

@bluetech

This comment has been minimized.

@blueyed

This comment has been minimized.

@blueyed
Copy link
Contributor

blueyed commented Jan 26, 2020

TBH, I find doing refactors without type checks to be a frightening prospect, at least in a "delicate" codebase like pytest . That's why I try to focus on adding types and only make changes when I have to.

Yeah. But given that it is likely that this code can be refactored/removed, it could be done in this PR already, based on having annotations here now.

@bluetech
Copy link
Member Author

Yeah. But given that it is likely that this code can be refactored/removed, it could be done in this PR already, based on having annotations here now.

I'd prefer not to do it in this PR, because I don't intend to work on it ATM. IMO this PR makes things slightly better already (in that whoever refactors it will have an easier time). The reason I'd like to merge it is that it blocks #6556 which in turn blocks more typing work.

@blueyed
Copy link
Contributor

blueyed commented Jan 28, 2020

@bluetech not sure why it blocks #6556, but my point is that it is better to get rid of the caches instead of renaming them (again). I think it should have been just removed / investigated with 7b1e3d1 (#6491) to begin with.

It seems like all tests are passing with this, so it might only be useful for caching, not for logic (or is not covered really):

 src/_pytest/main.py | 46 ++++++++--------------------------------------
 1 file changed, 8 insertions(+), 38 deletions(-)

diff --git i/src/_pytest/main.py w/src/_pytest/main.py
index a0cae18ed..451923679 100644
--- i/src/_pytest/main.py
+++ w/src/_pytest/main.py
@@ -409,17 +409,6 @@ def __init__(self, config: Config) -> None:
         self.startdir = config.invocation_dir
         self._initialpaths = frozenset()  # type: FrozenSet[py.path.local]
 
-        # Keep track of any collected nodes in here, so we don't duplicate fixtures
-        self._collection_node_cache1 = (
-            {}
-        )  # type: Dict[py.path.local, Sequence[nodes.Collector]]
-        self._collection_node_cache2 = (
-            {}
-        )  # type: Dict[Tuple[Type[nodes.Collector], py.path.local], nodes.Collector]
-        self._collection_node_cache3 = (
-            {}
-        )  # type: Dict[Tuple[Type[nodes.Collector], str], CollectReport]
-
         # Dirnames of pkgs with dunder-init files.
         self._collection_pkg_roots = {}  # type: Dict[py.path.local, Package]
 
@@ -537,9 +526,6 @@ def collect(self):
                 self._notfound.append((report_arg, sys.exc_info()[1]))
 
             self.trace.root.indent -= 1
-        self._collection_node_cache1.clear()
-        self._collection_node_cache2.clear()
-        self._collection_node_cache3.clear()
         self._collection_pkg_roots.clear()
 
     def _collect(self, argpath, names):
@@ -557,13 +543,10 @@ def _collect(self, argpath, names):
                 if parent.isdir():
                     pkginit = parent.join("__init__.py")
                     if pkginit.isfile():
-                        if pkginit not in self._collection_node_cache1:
-                            col = self._collectfile(pkginit, handle_dupes=False)
-                            if col:
-                                if isinstance(col[0], Package):
-                                    self._collection_pkg_roots[parent] = col[0]
-                                # always store a list in the cache, matchnodes expects it
-                                self._collection_node_cache1[col[0].fspath] = [col[0]]
+                        col = self._collectfile(pkginit, handle_dupes=False)
+                        if col:
+                            if isinstance(col[0], Package):
+                                self._collection_pkg_roots[parent] = col[0]
 
         # If it's a directory argument, recurse and look for any Subpackages.
         # Let the Package collector deal with subnodes, don't collect here.
@@ -590,21 +573,12 @@ def _collect(self, argpath, names):
 
                 for x in self._collectfile(path):
                     key = (type(x), x.fspath)
-                    if key in self._collection_node_cache2:
-                        yield self._collection_node_cache2[key]
-                    else:
-                        self._collection_node_cache2[key] = x
-                        yield x
+                    yield x
         else:
             assert argpath.check(file=1)
 
-            if argpath in self._collection_node_cache1:
-                col = self._collection_node_cache1[argpath]
-            else:
-                collect_root = self._collection_pkg_roots.get(argpath.dirname, self)
-                col = collect_root._collectfile(argpath, handle_dupes=False)
-                if col:
-                    self._collection_node_cache1[argpath] = col
+            collect_root = self._collection_pkg_roots.get(argpath.dirname, self)
+            col = collect_root._collectfile(argpath, handle_dupes=False)
             m = self.matchnodes(col, names)
             # If __init__.py was the only file requested, then the matched node will be
             # the corresponding Package, and the first yielded item will be the __init__
@@ -717,11 +691,7 @@ def _matchnodes(self, matching, names):
                 continue
             assert isinstance(node, nodes.Collector)
             key = (type(node), node.nodeid)
-            if key in self._collection_node_cache3:
-                rep = self._collection_node_cache3[key]
-            else:
-                rep = collect_one_node(node)
-                self._collection_node_cache3[key] = rep
+            rep = collect_one_node(node)
             if rep.passed:
                 has_matched = False
                 for x in rep.result:

But since you do not plan to work on it here, let's get it improved slightly then.

@bluetech bluetech merged commit e440b43 into pytest-dev:features Jan 28, 2020
@bluetech bluetech deleted the nodes-cache-split branch January 28, 2020 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants