-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
bpo-30891: importlib _find_and_load() try/except #2665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@Haypo, thanks for your PR! By analyzing the history of the files in this pull request, we identified @brettcannon, @ericsnowcurrently and @pitrou to be potential reviewers. |
I already proposed this change in my previous PR #2646, but @serhiy-storchaka complained that it would make import() slower. But @ncoghlan also noticed that previously, the whole function was protected by the global imp lock, whereas now "module = sys.modules[name]" is not protected by any lock and so there is a risk of a race condition. I agree, that's why I wrote this PR. See also discussion at PR #2646. |
If @serhiy-storchaka still considers that performance is more important that atomicity in this function, we can just move "module = sys.modules[name]" into the "with _ModuleLockManager(name):" block. |
In common case You can use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If anything this will make the "already loaded" case a tiny bit faster (since it skips the double name lookup), while making initial loads marginally slower (but likely imperceptibly slower relative to the overheads of actually reading files and directories from disk).
So +1 from me for going with the conventional solution of switching to relying on exception handling to help ensure consistency of state.
The "already loaded" case already is made significantly faster by handling in C code. This code is executed in "still not loaded" case. |
@serhiy-storchaka The module lock means that in the presence of concurrent attempts at loading the module, the 2nd and subsequent threads will follow the "no KeyError" path. It occurs to me that this could use a comment after the with statement explaining the need for that path, though. There's also the micro-optimisation option you suggested on the original PR: creating a |
A race condition is exceptional. It happens only when two or more threads simultaneously import the same module first time. Once the module is imported, all subsequent imports go by fast path in C code and don't have a race condition. This PR fixes even more rare case. When at the same time one thread unloads a module while other thread imports it (this case is similar to bpo-30876). I agree that this case is worth to be fixed, but not at the cost of more common cases. |
I don't want to add a sentinel object. I just moved "module = sys.modules[name]" in the "with _ModuleLockManager(name):" block. |
Lib/importlib/_bootstrap.py
Outdated
if name not in sys.modules: | ||
return _find_and_load_unlocked(name, import_) | ||
module = sys.modules[name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the branch that I think could use a comment to say "Another thread updated sys.modules while this one was waiting for the import lock"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note suggesting a clarifying comment around when the fallback path executes, but either way I'm fine with merging this approach.
Sorry, I don't understand which solution should be implemented:
|
Both (1) and (3) are safe. (1) and (3) are equally safe if ignore some corner cases (non-string keys in sys.modules which raise KeyError on comparison). In normal case (3) is slightly faster than (1). |
The "name in" + "sys.modules[name]" case is fine if you're holding the module lock the entire time (hence why I switched my review to |
Ok, let's try sys.modules.get(). |
Lib/importlib/_bootstrap.py
Outdated
@@ -957,13 +954,16 @@ def _find_and_load_unlocked(name, import_): | |||
return module | |||
|
|||
|
|||
_SENTINEL = object() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nick suggested the name _NEEDS_LOADING
and it looks better to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Use sys.modules.get() in the "with _ModuleLockManager(name):" block to protect the dictionary key with the module lock and use an atomic get to prevent race condition. Remove also _bootstrap._POPULATE since it was unused (_bootstrap_external now has its own _POPULATE object), add a new _SENTINEL object instead.
Since this fixes an observable behavior, I think it needs a NEWS entry. |
Can you please elaborate? How replacing "name not in sys.modules" with "sys.modules.get(name, _NEEDS_LOADING)" changes the behaviour? sys.modules is a dummy dict, so there is no behaviour change, no? The main change is that it fixes a race condition, but this commit is the 3rd one in http://bugs.python.org/issue30891 and the first commit was made by you and already contains a NEWS item:
It's enough to summarize the 3 commits, no? |
Right, I think the original NEWS entry is sufficient to cover all the changes - we're just iterating on the exact details of how that fix works. |
GH-2801 is a backport of this pull request to the 3.6 branch. |
Use sys.modules.get() in the "with _ModuleLockManager(name):" block to protect the dictionary key with the module lock and use an atomic get to prevent race condition. Remove also _bootstrap._POPULATE since it was unused (_bootstrap_external now has its own _POPULATE object), add a new _SENTINEL object instead. (cherry picked from commit e72b135)
Agree. |
importlib _find_and_load() now uses a "try/except KeyError" block to
check if the imported module is not sys.modules, rather than checking
for "name is sys.modules" and then get it with sys.modules[name].
This change should prevent a race condition.