bpo-34572: change _pickle unpickling to use import rather than retrieving from sys.modules #9047

tjb900 · 2018-09-03T16:06:17Z

import.c seems to contain a lot of protection to ensure that a partially-initialised module is not returned from import when in a multi-threaded environment. PyImport_Import enjoys these protections, but PyImport_GetModule does not.

https://bugs.python.org/issue34572

the-knights-who-say-ni · 2018-09-03T16:06:19Z

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

tjb900 · 2019-01-07T02:12:16Z

Hi!

Just wondering if there's something I can do to move this along - either that or be told that I'm completely wrong. We are using pure-python unpickling as a workaround for now, but I'd love to have some hope that we can switch back at some point in future.

(I notice the same issue appears to be present in @pitrou/pickle5-backport - which I'm very keen to start using!)

Thanks!

pganssle · 2019-02-05T14:45:24Z

@tjb900 I realize that this is a thread safety bug which is notoriously hard to test, but I generally find that pull requests that include tests are much easier to review, because it's easier to get a sense of what behavior it fixes and an assurance that there won't be regressions.

pganssle · 2019-02-05T14:49:12Z

Modules/_pickle.c

-            return NULL;
-    }
+    module = PyImport_Import(module_name);
+    if (module == NULL)


It seems to me that the difference in behavior here occurs when PyImport_Import returns NULL without raising an exception. I'm not really sure why this would occur. Is it safe to simply propagate the NULL up to the next level here?

IIRC (a big if), the issue wasn't that PyImport_Import returned NULL, it's that PyImport_GetModule can return a module object where the body of the module is still being executed (in another thread), and this partially initialised thing is then used immediately by the thread doing the unpickling.

I'm not sure how this fixes things, though. It seems like the only change in behavior is for the case where PyImport_Import returns NULL without raising an exception. Why would that ever happen, and why was it being explicitly handled here in the first place?

In the failing case before, PyImport_Import was never called (at least, not on the thread which causes the error): GetModule returned non-NULL, and this object fell through to the getattribute, which then (sometimes) failed because that attribute did not exist yet.

On L6626 I changed _GetModule -> _Import. This, I believe, invokes the locking logic described here

cpython/Python/import.c

Line 145 in cd90f6a

/* Locking primitives to prevent parallel imports of the same module

which prevents the original issue.

My mistake, I didn't realize that you were changing the import function. I'm still concerned as to why this was added in the first place, and whether this is a wider problem.

Another nitpick, per PEP 7:

Code structure: one space between keywords like if, for and the following left paren; no spaces inside the paren; braces are required everywhere, even where C permits them to be omitted, but do not add them to code you are not otherwise modifying. All new C code requires braces.

So can you add braces around this?

tjb900 · 2019-02-05T14:52:33Z

@pganssle Thanks for checking it out. Test case was attached with the bug report, but I didn't include it in the PR - will look at that ASAP.

pganssle · 2019-02-05T18:09:58Z

Looks to me like @ericsnowcurrently added this line when adding PyImport_GetModule in PR #3593, with associated issue [bpo-28411](https://bugs.python.org/issue2841https://bugs.python.org/issue28411).

It was added in several places, so I'm wondering if this thread safety bug is a problem for all the uses of PyImport_GetModule, and if so is this the right place for it? Maybe Eric or @ncoghlan has a bit more information about why this code was added and whether falling back to PyImport_Import is the right fix.

tjb900 · 2019-02-06T02:35:48Z

Hmmm. PyImport_GetModule was added in #3593, but it actually looks to me like that logic (i.e. "check in sys.modules, and only import if it isn't there") dates at least back back to

cpython/Modules/_pickle.c

Line 4160 in ca2d610

if (module == NULL) {

I'm still trying to identify at what point this became unsafe (have partially initialised modules always been present in sys.modules?). There is talk of changing the way the locking works in #2646, but the locking itself seems much older even than that.

ncoghlan · 2019-02-06T11:10:08Z

It looks to me like the cpickle logic has always been unsafe, but prior to the introduction of per-module import locks for Python 3.3 in https://bugs.python.org/issue9260 it was likely much harder to hit.

pganssle · 2019-02-06T13:23:04Z

So if I understand this correctly, the current version of the code likely works the way it does for performance reasons, but unfortunately that comes at the cost of thread safety, in which case I think the current patch is likely the right approach (at least for now - maybe PyImport_GetModule needs to be re-designed to take into account partially initialized modules).

If it's possible to make a reliable but not terribly resource-intensive test for this, I think that would be a big help, but that may be a lot to ask.

pganssle · 2019-02-07T13:29:10Z

@tjb900 Thanks for adding that test. I've verified that this test consistently fails when run without the other changes in this PR, and it runs quickly whether or not it succeeds, so it's a +1 from me!

One last nitpick, though is that I think it would be better to put it in with the other pickle tests so that people looking to add or change tests will be able to find it more easily (for example there are three or four different test modules where strftime is tested and so I make duplicate tests or miss where things are being tested all the time). That said, if you had a good reason for putting it in a separate file, I'm not doctrinaire about the "one module one test file" thing (though I don't actually know if there's an official policy on this from the CPython core dev team).

pablogsal · 2019-02-07T13:41:57Z

Lib/test/test_threaded_pickle.py

+slow_import_module = """
+import time
+
+time.sleep(%(delay)s)


This test relies on sleep to work. I do not feel comfortable with such tests as it may fail or not work as intended almost certainly in the slowest or more exotic buildbots that we have. Check out, for example, the discussion in #10413 .

As an additional note, it also needs the reap_threads decorator.

It will make the test a bit more complicated, but I think you can avoid sleep and make this more deterministic by creating a deliberate deadlock like this:

Acquire Lock A in the test

Attempt to acquire lock A in the partially-initialized module

Spawn a thread that attempts to unpickle the object that involves an import

Release lock A

After the partially-initialized module has lock A, it adds the attribute that the pickled object uses.

I think there's still a race condition there that doesn't make it deterministic - namely that the thread which is unpickling the object needs to reach the "access the imported attribute" stage before lock A is released in order for it to be a good test, but I think if you fiddle around with the locks and threads a little bit you can get it to work.

I'll be more radical and say I don't think we need a test at all, if we can't make it simpler.

@pitrou I think removing the test because it's not guaranteed to test the thing it's intended to test would be a mistake. This still successfully tests what it's intending to test with high probability, which means that a regression would be very unlikely to ever hit master. Unlike other tests which cause erroneous failures, this causes erroneous successes, so its flakiness is unlikely to cause any erroneously broken buildbots.

If left in as a "flaky" test, I'd probably view it similar to a hypothesis test, which is that it's a probabilistic test that never has false negatives - if the test succeeds you are only partially certain that you have no bugs, but if it fails you know you have a bug.

Elsewhere you mention adding a comment about why PyImport_GetModule was dropped to prevent regressions - if it's worth a comment (which is not guaranteed to work), it's probably worth at least trying to enforce this with a test. Possibly with a comment that says why the test isn't guaranteed to detect regressions.

Well, there are corner cases that we fix without writing tests for, because they are too tricky to reproduce. I think this one is a bit borderline: it's possible to reproduce, but nevertheless it needs a complex machinery.

I certainly agree that if it's likely to have a negative impact on the test suite for reasons other than that it's not guaranteed to fail in the case of a regression the test should probably be considered optional.

I do think it's worth trying to get a test for this in the test suite, even if just as an exercising in providing a good example of how to test a race condition.

pablogsal · 2019-02-07T13:43:20Z

Lib/test/test_threaded_pickle.py

+            results = []
+            forget("slowimport")
+            self.assertNotIn("slowimport", sys.modules)
+            importlib.invalidate_caches()


Can this affect any of the importlib tests?

Any advice on how I can address this? FWIW I pretty much copied this test from test_threaded_import.py.

I haven't tested without that line, so it may not be necessary. I'll see if the test works (i.e. fails without the patch) reliably without it.

Actually, I don't know why it would affect the importlib tests (I'm the one who added that function IIRC).

pablogsal · 2019-02-07T13:44:02Z

Modules/_pickle.c

@@ -6623,13 +6623,9 @@ _pickle_Unpickler_find_class_impl(UnpicklerObject *self,
        }
    }

-    module = PyImport_GetModule(module_name);
+    module = PyImport_Import(module_name);


Could you benchmark how slower this is compared with just using PyImport_GetModule?

git master:

$ ./python -m timeit -s "import pickle, logging; d = pickle.dumps((logging.Logger, logging.Handler, logging.Filter, logging.Formatter))" "pickle.loads(d)" 100000 loops, best of 5: 2.55 usec per loop

This PR:

$ ./python -m timeit -s "import pickle, logging; d = pickle.dumps((logging.Logger, logging.Handler, logging.Filter, logging.Formatter))" "pickle.loads(d)" 50000 loops, best of 5: 4.84 usec per loop

So this is a sizable performance drop, but on a micro-benchmark. I don't think this will be significant in real-world cases. Usually, pickle performance becomes important when large pieces of data are serialized, and by construction you don't get many STACK_GLOBAL opcodes in it.

cc'ing @pierreglaser @ogrisel out of curiosity.

+1. Also, STACK_GLOBAL should tend to progressively vanish in the pickle string, in favor of BINGET as functions/classes get progressively added to the memo (if used).

bedevere-bot · 2019-02-07T13:44:27Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

pablogsal · 2019-02-07T13:47:01Z

One last nitpick, though is that I think it would be better to put it in with the other pickle tests so that people looking to add or change tests will be able to find it more easily (for example there are three or four different test modules where strftime is tested and so I make duplicate tests or miss where things are being tested all the time). That said, if you had a good reason for putting it in a separate file, I'm not doctrinaire about the "one module one test file" thing (though I don't actually know if there's an official policy on this from the CPython core dev team).

I agree with @pganssle in that this test should be placed with the rest of the pickle tests.

pitrou · 2019-02-07T15:35:57Z

Lib/test/test_threaded_pickle.py

+slow_import_module = """
+import time
+
+time.sleep(%(delay)s)


I'll be more radical and say I don't think we need a test at all, if we can't make it simpler.

pitrou · 2019-02-07T15:40:22Z

Modules/_pickle.c

@@ -6623,13 +6623,9 @@ _pickle_Unpickler_find_class_impl(UnpicklerObject *self,
        }
    }

-    module = PyImport_GetModule(module_name);
+    module = PyImport_Import(module_name);


git master:

$ ./python -m timeit -s "import pickle, logging; d = pickle.dumps((logging.Logger, logging.Handler, logging.Filter, logging.Formatter))" "pickle.loads(d)" 100000 loops, best of 5: 2.55 usec per loop

This PR:

$ ./python -m timeit -s "import pickle, logging; d = pickle.dumps((logging.Logger, logging.Handler, logging.Filter, logging.Formatter))" "pickle.loads(d)" 50000 loops, best of 5: 4.84 usec per loop

So this is a sizable performance drop, but on a micro-benchmark. I don't think this will be significant in real-world cases. Usually, pickle performance becomes important when large pieces of data are serialized, and by construction you don't get many STACK_GLOBAL opcodes in it.

cc'ing @pierreglaser @ogrisel out of curiosity.

Modules/_pickle.c

pganssle · 2019-02-07T18:19:22Z

I haven't tried adapting this to actually use pickle, but I can consistently get a failure due to partially initialized modules on the first try with this configuration. It does use time.sleep, but it only needs to do the import once, which may reduce the side effects.

tjb900 · 2019-02-08T03:09:23Z

This other case of PyImport_GetModule also appears to be making the same, unsafe, optimisation. Should I fix this too, or leave that for a different PR?

Objects/typeobject.c:

    /* Try to fetch cached copy of copyreg from sys.modules first in an
       attempt to avoid the import overhead. Previously this was implemented
       by storing a reference to the cached module in a static variable, but

tjb900 · 2019-02-08T08:25:58Z

I have made the requested changes; please review again.

Happy to backport to 3.6 and 3.7 also.

bedevere-bot · 2019-02-08T08:26:00Z

Thanks for making the requested changes!

@pablogsal: please review the changes made to this pull request.

pitrou · 2019-02-08T17:02:40Z

This other case of PyImport_GetModule also appears to be making the same, unsafe, optimisation. Should I fix this too, or leave that for a different PR?

Please make that a separate issue on bugs.python.org. We'll have to discuss its performance implications separately.

pitrou · 2019-02-08T17:05:29Z

Lib/test/pickletester.py

+        t1.join()
+        t2.join()
+
+        self.assertEqual(len(results), 2)


Perhaps also assert something about the results?

pitrou · 2019-02-08T17:05:47Z

Lib/test/pickletester.py

@@ -1174,6 +1178,63 @@ def test_truncated_data(self):
        for p in badpickles:
            self.check_unpickling_error(self.truncated_errors, p)

+    @reap_threads
+    def test_unpickle_module_race(self):


This is a neat test :-)

pganssle · 2019-02-18T13:20:39Z

@pablogsal Any thoughts on the new tests for this one?

pitrou

LGTM.

pitrou · 2019-02-18T15:11:21Z

@pablogsal, I think your comments have been addressed. Can you confirm?

pablogsal · 2019-02-18T15:29:23Z

This addresses all my concerns!

I like test_unpickle_module_race a lot :)

miss-islington · 2019-02-18T15:30:55Z

Thanks @tjb900 for the PR, and @pitrou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7.
🐍🍒⛏🤖

…ving from sys.modules (pythonGH-9047) Fix C implementation of pickle.loads to use importlib's locking mechanisms, and thereby avoid using partially-loaded modules. (cherry picked from commit 4371c0a) Co-authored-by: tjb900 <[email protected]>

bedevere-bot · 2019-02-18T15:31:10Z

GH-11921 is a backport of this pull request to the 3.7 branch.

…ving from sys.modules (GH-9047) (GH-11921) Fix C implementation of pickle.loads to use importlib's locking mechanisms, and thereby avoid using partially-loaded modules. (cherry picked from commit 4371c0a) Co-authored-by: tjb900 <[email protected]>

the-knights-who-say-ni added the CLA not signed label Sep 3, 2018

bedevere-bot added the awaiting review label Sep 3, 2018

change unpickling to use import rather than retrieving from sys.modules

e47c5e1

tjb900 force-pushed the pickle_import_race_fix branch from c71f99f to e47c5e1 Compare September 5, 2018 03:06

the-knights-who-say-ni added CLA signed and removed CLA not signed labels Sep 5, 2018

pganssle reviewed Feb 5, 2019

View reviewed changes

tjb900 added 2 commits February 7, 2019 09:08

pep7 fixup

595bdbb

add unit test for bpo-34572

f72802a

pablogsal requested review from ericsnowcurrently, pablogsal and pitrou February 7, 2019 13:33

pablogsal requested changes Feb 7, 2019

View reviewed changes

bedevere-bot added awaiting changes and removed awaiting review labels Feb 7, 2019

pablogsal self-assigned this Feb 7, 2019

pitrou reviewed Feb 7, 2019

View reviewed changes

tjb900 added 3 commits February 8, 2019 12:48

add comment to explain why GetModule wasn't used

fa5dbeb

rewrite test case to avoid sleeps

64a7f95

move test to AbstractUnpickleTester

202d4e2

bedevere-bot added awaiting change review and removed awaiting changes labels Feb 8, 2019

pitrou reviewed Feb 8, 2019

View reviewed changes

better assertion

a0425ef

pitrou added the needs backport to 3.7 label Feb 18, 2019

pitrou approved these changes Feb 18, 2019

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting change review labels Feb 18, 2019

pablogsal approved these changes Feb 18, 2019

View reviewed changes

pitrou merged commit 4371c0a into python:master Feb 18, 2019

bedevere-bot removed the awaiting merge label Feb 18, 2019

bedevere-bot removed the needs backport to 3.7 label Feb 18, 2019

Uh oh!

bpo-34572: change _pickle unpickling to use import rather than retrieving from sys.modules #9047

bpo-34572: change _pickle unpickling to use import rather than retrieving from sys.modules #9047

Uh oh!

Conversation

tjb900 commented Sep 3, 2018 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

the-knights-who-say-ni commented Sep 3, 2018

Uh oh!

tjb900 commented Jan 7, 2019

Uh oh!

pganssle commented Feb 5, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tjb900 Feb 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tjb900 commented Feb 5, 2019

Uh oh!

pganssle commented Feb 5, 2019 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tjb900 commented Feb 6, 2019

Uh oh!

ncoghlan commented Feb 6, 2019

Uh oh!

pganssle commented Feb 6, 2019

Uh oh!

pganssle commented Feb 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pganssle Feb 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bedevere-bot commented Feb 7, 2019

Uh oh!

pablogsal commented Feb 7, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pganssle commented Feb 7, 2019

Uh oh!

tjb900 commented Feb 8, 2019

tjb900 commented Sep 3, 2018 •

edited by bedevere-bot

Loading

tjb900 Feb 5, 2019 •

edited

Loading

pganssle commented Feb 5, 2019 •

edited by bedevere-bot

Loading

pganssle commented Feb 7, 2019 •

edited

Loading

pganssle Feb 7, 2019 •

edited

Loading