-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
"import numpy" leaks - reported by valgrind #18271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@bstaletic Yes, NumPy leaks memory on import, but only once. Can you explain if this is a problem for you? I do check numpy with valgrind, but there are certain things during Python and NumPy import that simply are designed in a way that they must "leak" (exactly once). |
It's not exactly a problem, as pybind11 can keep suppressing the valgrind-reported errors. I assumed you weren't aware of these and thus opened the bug report.
To be clear, if you say this is fine and expected, I'm fine with that. |
There were always some leaks that seemed fine, but now I notice that all of these are in Cython code generated for numpy random (and not in NumPy proper). To be honest, I stopped checking import time leaks, but I suppose there are fewer nowadays (at least few "definite" ones that valgrind notices, there are definitely more leaks :)). I think you are right, we actually should see if Cython generates incorrect reference counting or so. That last one could be a serious refcounting bug somewhere. I had run valgrind over NumPy 1.20 a while ago, but there were a few changes after that, and the refcount check was a bit longer, I guess. Should plan to do more full runs again soon before 1.20.1, just to be sure nothing happened in the mean time. Can you give more information on that leak (e.g. what code is run when it is triggered?). Tracking an incorrectly refcounted tuple seems very hard otherwise (unless I get lucky and the NumPy test suit finds it as well). |
@bstaletic one small thing, if you this inside one of the many tests. Just in case it helps my little pytest plugin: https://pypistats.org/packages/pytest-valgrind could help with finding which test is failing exactly. (The github repo is in a slightly newer state, but it shouldn't matter much.) |
Thanks for looking into this, @seberg! To add to @bstaletic's story: we managed to run Valgrind on pybind11 and clean pybind11 of all reported leaks, a month ago, and suppressed the leaks reported on Here's how I reproduced the new leak that @bstaletic reported: Suppressing the error with our current suppression file, this one remains:
|
Oh, I should specify one more thing, perhaps: we were holding back on having more general rules (e.g., just the stack up to Apart from that, we're perfectly happy. The main goal is to know what to suppress such that we can have Valgrind run in CI for pybind11. |
Honestly, I don't use any suppression files at all, so :). The only reason we don't have a CI run in NumPy is that there are some tests that always have to leak, and running most of the test suit with Its great that you guys do these checks! I am a little stumped, working of master, but:
Although let me try without the interactive shell... Maybe that does something weird. |
Yeah, same running from the shell directly; no reported leaks with just import (and version printing). Maybe the cython version matters?
|
We have actually stumbled across that plugin when we were first setting valgrind up with pybind11. It looked interesting, but so far we decided not to rely on it, because of the inability to report errors that happen at start up or shut down. It would have had certainly come in handy a few times by now, but habits, I guess. I'm used to just commenting out a bunch of tests until I pin point where the error is coming from, which is the caveman method. |
I did that first, when cleaning out NumPy: manual bisecting (and in the end, once you hit the individual test level, it is often necessary in any case). Then I wrote that, thinking it is a bit better ;). |
That seems to be the same? Besides, I believe I really just installed the wheel from PyPI. I'm also using deadsnakes' |
I must admit I was intrigued by the plugin :-) Given that we're now running on CI, it should be easy enough to spot the test that changed and (probably) causes the leak, but I'd be happy to give that a try! But yes, as @bstaletic says, pybind11 has a bunch of internals so we should definitely check the first import of a pybind11 module (outside of a |
I can repro with my custom built, non-patched, CPython.
Here's the summary:
That includes the new |
Thanks @bstaletic, I can reproduce it with the wheel, did not yet try to check the 1.20.x branch, but master did not diverge a whole lot. |
Nevermind, I can reproduce the leak on the 1.20 branch, just not on master branch (I am super surprised, because I would think the cython generation is pretty much unchanged; so maybe have to triple check). Note sure what relevant change happened in between, but at least I can reproduce it now. I also doubt it is a big issue (since it is on import only probably). But I would like it if you guys wouldn't even need a numpy suppression file :). |
That would be amazing! But if you just tell us "these are the leaks you need to live with and you don't need to report similar ones in the future", we're also totally OK with that :-) |
@YannickJadoul, OK: I am very certain, there is no need to worry about them. But I suspect they will go away again (whether due to a cython upgrade or otherwise). For now, I have figured out that if I compile NumPy with a debug version (which sets some debug paths also in NumPy itself), it doesn't show up. But if I compile with plain For today, I will stop looking into this. I hope there is some small, simple fix in Cython that would make this go away, so will look into it a bit more in the next days and hopefully find it. As this appears in module init, I am not worried that it is problematic. |
Thanks for confirming. We'll stick to the current suppression files, then, and hopefully in some future, we try without and find a beautiful world without suppression files ;-) |
Thanks for all your effort, @seberg!! |
OK, I had another look. The reason that I couldn't reproduce it at first is that it is related to cython's Now I find it a bit strange that is only one tuple is lost per file, and I do not really see where there might be a |
I have convinced myself there is nothing actionable in NumPy, so opened a cython issue: cython/cython#3994 just in case. My hope is that with some new Python or Cython version, this just goes away. I feel this has to do with module cleanup, and I am not even certain Python itself is cleaned up in that regard yet. |
@seberg Thanks for investigating!
CPython definitely isn't clean, but there are efforts to clean it up in some future version. Maybe one day... |
Thanks again, @seberg! |
As pointed out on the cython issue, you can get rid of these "not really leaks" by using cython with |
Thank you for the update, @seberg! I think you're pretty right when stating in cython/cython#3994 ;-)
Great to have pinpointed the actual issue, though! |
Reproducing code example:
Error message:
The output is fairly long, so I made a gist: https://gist.github.com/bstaletic/061ea8912ed5bc4e238363a120f70a1c
The suppression file is in the gist as well.
NumPy/Python version information:
Python is 3.9.1, debug version, with assertions
NumPy is 1.20.0, but these leaks were present on 1.19.x as well. I haven't checked previous versions, because that's when pybind11 incorporated valgrind in its CI.
The text was updated successfully, but these errors were encountered: