Skip to content

bpo-34572: change _pickle unpickling to use import rather than retrieving from sys.modules #9047

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 18, 2019

Conversation

tjb900
Copy link
Contributor

@tjb900 tjb900 commented Sep 3, 2018

import.c seems to contain a lot of protection to ensure that a partially-initialised module is not returned from import when in a multi-threaded environment. PyImport_Import enjoys these protections, but PyImport_GetModule does not.

https://bugs.python.org/issue34572

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

@tjb900
Copy link
Contributor Author

tjb900 commented Jan 7, 2019

Hi!

Just wondering if there's something I can do to move this along - either that or be told that I'm completely wrong. We are using pure-python unpickling as a workaround for now, but I'd love to have some hope that we can switch back at some point in future.

(I notice the same issue appears to be present in @pitrou/pickle5-backport - which I'm very keen to start using!)

Thanks!

@pganssle
Copy link
Member

pganssle commented Feb 5, 2019

@tjb900 I realize that this is a thread safety bug which is notoriously hard to test, but I generally find that pull requests that include tests are much easier to review, because it's easier to get a sense of what behavior it fixes and an assurance that there won't be regressions.

return NULL;
}
module = PyImport_Import(module_name);
if (module == NULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the difference in behavior here occurs when PyImport_Import returns NULL without raising an exception. I'm not really sure why this would occur. Is it safe to simply propagate the NULL up to the next level here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC (a big if), the issue wasn't that PyImport_Import returned NULL, it's that PyImport_GetModule can return a module object where the body of the module is still being executed (in another thread), and this partially initialised thing is then used immediately by the thread doing the unpickling.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this fixes things, though. It seems like the only change in behavior is for the case where PyImport_Import returns NULL without raising an exception. Why would that ever happen, and why was it being explicitly handled here in the first place?

Copy link
Contributor Author

@tjb900 tjb900 Feb 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the failing case before, PyImport_Import was never called (at least, not on the thread which causes the error): GetModule returned non-NULL, and this object fell through to the getattribute, which then (sometimes) failed because that attribute did not exist yet.

On L6626 I changed _GetModule -> _Import. This, I believe, invokes the locking logic described here

/* Locking primitives to prevent parallel imports of the same module
which prevents the original issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, I didn't realize that you were changing the import function. I'm still concerned as to why this was added in the first place, and whether this is a wider problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nitpick, per PEP 7:

Code structure: one space between keywords like if, for and the following left paren; no spaces inside the paren; braces are required everywhere, even where C permits them to be omitted, but do not add them to code you are not otherwise modifying. All new C code requires braces.

So can you add braces around this?

@tjb900
Copy link
Contributor Author

tjb900 commented Feb 5, 2019

@pganssle Thanks for checking it out. Test case was attached with the bug report, but I didn't include it in the PR - will look at that ASAP.

@pganssle
Copy link
Member

pganssle commented Feb 5, 2019

Looks to me like @ericsnowcurrently added this line when adding PyImport_GetModule in PR #3593, with associated issue [bpo-28411](https://bugs.python.org/issue2841https://bugs.python.org/issue28411).

It was added in several places, so I'm wondering if this thread safety bug is a problem for all the uses of PyImport_GetModule, and if so is this the right place for it? Maybe Eric or @ncoghlan has a bit more information about why this code was added and whether falling back to PyImport_Import is the right fix.

@tjb900
Copy link
Contributor Author

tjb900 commented Feb 6, 2019

Hmmm. PyImport_GetModule was added in #3593, but it actually looks to me like that logic (i.e. "check in sys.modules, and only import if it isn't there") dates at least back back to

if (module == NULL) {

I'm still trying to identify at what point this became unsafe (have partially initialised modules always been present in sys.modules?). There is talk of changing the way the locking works in #2646, but the locking itself seems much older even than that.

@ncoghlan
Copy link
Contributor

ncoghlan commented Feb 6, 2019

It looks to me like the cpickle logic has always been unsafe, but prior to the introduction of per-module import locks for Python 3.3 in https://bugs.python.org/issue9260 it was likely much harder to hit.

@pganssle
Copy link
Member

pganssle commented Feb 6, 2019

So if I understand this correctly, the current version of the code likely works the way it does for performance reasons, but unfortunately that comes at the cost of thread safety, in which case I think the current patch is likely the right approach (at least for now - maybe PyImport_GetModule needs to be re-designed to take into account partially initialized modules).

If it's possible to make a reliable but not terribly resource-intensive test for this, I think that would be a big help, but that may be a lot to ask.

@pganssle
Copy link
Member

pganssle commented Feb 7, 2019

@tjb900 Thanks for adding that test. I've verified that this test consistently fails when run without the other changes in this PR, and it runs quickly whether or not it succeeds, so it's a +1 from me!

One last nitpick, though is that I think it would be better to put it in with the other pickle tests so that people looking to add or change tests will be able to find it more easily (for example there are three or four different test modules where strftime is tested and so I make duplicate tests or miss where things are being tested all the time). That said, if you had a good reason for putting it in a separate file, I'm not doctrinaire about the "one module one test file" thing (though I don't actually know if there's an official policy on this from the CPython core dev team).

slow_import_module = """
import time

time.sleep(%(delay)s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test relies on sleep to work. I do not feel comfortable with such tests as it may fail or not work as intended almost certainly in the slowest or more exotic buildbots that we have. Check out, for example, the discussion in #10413 .

As an additional note, it also needs the reap_threads decorator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will make the test a bit more complicated, but I think you can avoid sleep and make this more deterministic by creating a deliberate deadlock like this:

  1. Acquire Lock A in the test
  2. Attempt to acquire lock A in the partially-initialized module
  3. Spawn a thread that attempts to unpickle the object that involves an import
  4. Release lock A
  5. After the partially-initialized module has lock A, it adds the attribute that the pickled object uses.

I think there's still a race condition there that doesn't make it deterministic - namely that the thread which is unpickling the object needs to reach the "access the imported attribute" stage before lock A is released in order for it to be a good test, but I think if you fiddle around with the locks and threads a little bit you can get it to work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be more radical and say I don't think we need a test at all, if we can't make it simpler.

Copy link
Member

@pganssle pganssle Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitrou I think removing the test because it's not guaranteed to test the thing it's intended to test would be a mistake. This still successfully tests what it's intending to test with high probability, which means that a regression would be very unlikely to ever hit master. Unlike other tests which cause erroneous failures, this causes erroneous successes, so its flakiness is unlikely to cause any erroneously broken buildbots.

If left in as a "flaky" test, I'd probably view it similar to a hypothesis test, which is that it's a probabilistic test that never has false negatives - if the test succeeds you are only partially certain that you have no bugs, but if it fails you know you have a bug.

Elsewhere you mention adding a comment about why PyImport_GetModule was dropped to prevent regressions - if it's worth a comment (which is not guaranteed to work), it's probably worth at least trying to enforce this with a test. Possibly with a comment that says why the test isn't guaranteed to detect regressions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, there are corner cases that we fix without writing tests for, because they are too tricky to reproduce. I think this one is a bit borderline: it's possible to reproduce, but nevertheless it needs a complex machinery.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I certainly agree that if it's likely to have a negative impact on the test suite for reasons other than that it's not guaranteed to fail in the case of a regression the test should probably be considered optional.

I do think it's worth trying to get a test for this in the test suite, even if just as an exercising in providing a good example of how to test a race condition.

results = []
forget("slowimport")
self.assertNotIn("slowimport", sys.modules)
importlib.invalidate_caches()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this affect any of the importlib tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any advice on how I can address this? FWIW I pretty much copied this test from test_threaded_import.py.

I haven't tested without that line, so it may not be necessary. I'll see if the test works (i.e. fails without the patch) reliably without it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I don't know why it would affect the importlib tests (I'm the one who added that function IIRC).

@@ -6623,13 +6623,9 @@ _pickle_Unpickler_find_class_impl(UnpicklerObject *self,
}
}

module = PyImport_GetModule(module_name);
module = PyImport_Import(module_name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you benchmark how slower this is compared with just using PyImport_GetModule?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git master:

$ ./python -m timeit -s "import pickle, logging; d = pickle.dumps((logging.Logger, logging.Handler, logging.Filter, logging.Formatter))" "pickle.loads(d)"
100000 loops, best of 5: 2.55 usec per loop

This PR:

$ ./python -m timeit -s "import pickle, logging; d = pickle.dumps((logging.Logger, logging.Handler, logging.Filter, logging.Formatter))" "pickle.loads(d)"
50000 loops, best of 5: 4.84 usec per loop

So this is a sizable performance drop, but on a micro-benchmark. I don't think this will be significant in real-world cases. Usually, pickle performance becomes important when large pieces of data are serialized, and by construction you don't get many STACK_GLOBAL opcodes in it.

cc'ing @pierreglaser @ogrisel out of curiosity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Also, STACK_GLOBAL should tend to progressively vanish in the pickle string, in favor of BINGET as functions/classes get progressively added to the memo (if used).

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@pablogsal
Copy link
Member

One last nitpick, though is that I think it would be better to put it in with the other pickle tests so that people looking to add or change tests will be able to find it more easily (for example there are three or four different test modules where strftime is tested and so I make duplicate tests or miss where things are being tested all the time). That said, if you had a good reason for putting it in a separate file, I'm not doctrinaire about the "one module one test file" thing (though I don't actually know if there's an official policy on this from the CPython core dev team).

I agree with @pganssle in that this test should be placed with the rest of the pickle tests.

@pablogsal pablogsal self-assigned this Feb 7, 2019
slow_import_module = """
import time

time.sleep(%(delay)s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be more radical and say I don't think we need a test at all, if we can't make it simpler.

@@ -6623,13 +6623,9 @@ _pickle_Unpickler_find_class_impl(UnpicklerObject *self,
}
}

module = PyImport_GetModule(module_name);
module = PyImport_Import(module_name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git master:

$ ./python -m timeit -s "import pickle, logging; d = pickle.dumps((logging.Logger, logging.Handler, logging.Filter, logging.Formatter))" "pickle.loads(d)"
100000 loops, best of 5: 2.55 usec per loop

This PR:

$ ./python -m timeit -s "import pickle, logging; d = pickle.dumps((logging.Logger, logging.Handler, logging.Filter, logging.Formatter))" "pickle.loads(d)"
50000 loops, best of 5: 4.84 usec per loop

So this is a sizable performance drop, but on a micro-benchmark. I don't think this will be significant in real-world cases. Usually, pickle performance becomes important when large pieces of data are serialized, and by construction you don't get many STACK_GLOBAL opcodes in it.

cc'ing @pierreglaser @ogrisel out of curiosity.

@pganssle
Copy link
Member

pganssle commented Feb 7, 2019

I haven't tried adapting this to actually use pickle, but I can consistently get a failure due to partially initialized modules on the first try with this configuration. It does use time.sleep, but it only needs to do the import once, which may reduce the side effects.

@tjb900
Copy link
Contributor Author

tjb900 commented Feb 8, 2019

This other case of PyImport_GetModule also appears to be making the same, unsafe, optimisation. Should I fix this too, or leave that for a different PR?

Objects/typeobject.c:

    /* Try to fetch cached copy of copyreg from sys.modules first in an
       attempt to avoid the import overhead. Previously this was implemented
       by storing a reference to the cached module in a static variable, but

@tjb900
Copy link
Contributor Author

tjb900 commented Feb 8, 2019

I have made the requested changes; please review again.

Happy to backport to 3.6 and 3.7 also.

@bedevere-bot
Copy link

Thanks for making the requested changes!

@pablogsal: please review the changes made to this pull request.

@pitrou
Copy link
Member

pitrou commented Feb 8, 2019

This other case of PyImport_GetModule also appears to be making the same, unsafe, optimisation. Should I fix this too, or leave that for a different PR?

Please make that a separate issue on bugs.python.org. We'll have to discuss its performance implications separately.

t1.join()
t2.join()

self.assertEqual(len(results), 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps also assert something about the results?

@@ -1174,6 +1178,63 @@ def test_truncated_data(self):
for p in badpickles:
self.check_unpickling_error(self.truncated_errors, p)

@reap_threads
def test_unpickle_module_race(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a neat test :-)

@pganssle
Copy link
Member

@pablogsal Any thoughts on the new tests for this one?

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@pitrou
Copy link
Member

pitrou commented Feb 18, 2019

@pablogsal, I think your comments have been addressed. Can you confirm?

@pablogsal
Copy link
Member

This addresses all my concerns!

I like test_unpickle_module_race a lot :)

@pitrou pitrou merged commit 4371c0a into python:master Feb 18, 2019
@miss-islington
Copy link
Contributor

Thanks @tjb900 for the PR, and @pitrou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Feb 18, 2019
…ving from sys.modules (pythonGH-9047)

Fix C implementation of pickle.loads to use importlib's locking mechanisms, and thereby avoid using partially-loaded modules.
(cherry picked from commit 4371c0a)

Co-authored-by: tjb900 <[email protected]>
@bedevere-bot
Copy link

GH-11921 is a backport of this pull request to the 3.7 branch.

pitrou pushed a commit that referenced this pull request Feb 18, 2019
…ving from sys.modules (GH-9047) (GH-11921)

Fix C implementation of pickle.loads to use importlib's locking mechanisms, and thereby avoid using partially-loaded modules.
(cherry picked from commit 4371c0a)

Co-authored-by: tjb900 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants