-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Avoid memory leaks when destroying TLS values on MacOS X #28661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Move the panic logging code to a function separate from `on_panic` and simplify the code to decide whether the backtrace should be logged.
Move the panic handling logic from the `unwind` module to `panicking` and use a panic counter to distinguish between normal state, panics and double panics.
The double-panic `abort` is run after the logging code, to provide feedback in case of a double-panic. This means that if the panic logging fails with a panic, the `abort` might never be reached. This should not normally occur, but if the `on_panic` function detects more than 2 panics, aborting *before* logging makes panic handling somewhat more robust, as it avoids an infinite recursion, which would eventually crash the process, but also make the problem harder to debug. This handles the FIXME about what to do if the thread printing panics.
The `destroy_value` function should only be used within the `imp` module, it should not be exported.
The MacOS X TLS implementation does not try tro handle the case in which new thread termination callbacks are registered (using `_tlv_atexit`) while the thread is already terminating. This can cause memory leaks and double-frees, as pointed out in Disallowing the registration of new destructors fixes this issue. As additional bonus, it should also make it possible to simplify `run_dtors` in the Linux implementation of TLS.
Since LOCAL_STDERR is a TLS variable, it is possible that it has already been destroyed when the `on_panic` function is invoked. Instead of re-panicking on its access, protect the panic logging code checking the TLS variable status (as the code is already implicitly doing with `thread_info::current_thread`).
r? @brson (rust_highfive has picked a reviewer for you, use r? to override) |
Can this separate out the commits from #28631? It looks like they're not functionally needed and it seems like unrelated functionality perhaps? Would at least make review easier! |
if intrinsics::needs_drop::<T>() && self.dtor_running.get() { | ||
return None | ||
if intrinsics::needs_drop::<T>() { | ||
if DTOR_RUNNING.with(|running| { running.get() }) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this change may be a little too-backwards-incompatible to work out well. I worry that this is "fixing" the problem by papering over some other underlying cause. One example of the breakage is the travis logs indicate a failing test which otherwise passes here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change I posted takes a pretty extreme stance: if we are destroying TLS, we are not guaranteed that any TLS is usable.
Conversely the failing test checks that never-used TLS variable can be initialised during the destruction of TLS.
Should we allow never-used TLS variables to be initialised while TLS is being destroyed?
This could be fixed by using the same approach in MacOS X as in Linux (i.e. subscribing a single destructor manager, then keeping a list of dtors to be run).
I'm not sure whether this guarantee is very convenient, as only the variables that have never been used are known to be useable. The other TLS variables available during the teardown of TLS depend on the order in which they are destroyed, which looks very fragile to me.
Oh I see what the required piece is for the dependence on the other PR! While certainly a good change to check whether it's destroyed, I'm a little confused what triggered a panic in the first place? The example from #28129 I'd expect to not panic... |
This makes the issue with #28129 somewhat more obvious: http://is.gd/ZO6da8 In the original code, it was only used in As you can see, it is very hard to write code using TLS variables inside This is why I was wondering if instead of guaranteeing that never-used TLS variable can be used during TLS teardown a more robust approach could be to have a |
I might even go as far saying that the (possibly somewhat inconvenient) robust/safe way to constrain the spec is: no TLS variable is accessible through The destruction of TLS variable would still be possible ( Uhm... I realised that my comments might look more like material for a discussion. |
Whether or not this should have an RFC depends on how much of an invasive change this turns out to be. New methods probably want to go through an RFC, but minor tweaks to existing ones are fine to leave around. Thanks for the explanation about where the panic comes from, makes sense! I would personally be wary of a I'm worried about blanket disallowing reinitilization during destruction mainly because there doesn't seem to be a technical reason to do so other than "weird stuff happens otherwise", and I'd prefer to have a better understand of what's happening under the hood before moving forward. My intention in opening the issue was to just document what was happening and track behavior like the corruption on OSX. I don't have a strong opinion one way or another how it should be handled, and I'm personally ok with weird behavior so long as there's no unsafety (followed up by documenting how to avoid weird behavior). |
Closing due to inactivity, but feel free to submit an RFC or reopen this as a new PR! |
Depends on #28631.
Fixes the memory fault in #28129 (and makes the problem cause explicit: stdout is being initialised during the thread destruction).
Before this is merged I would like to investigate how the mac-specific hack in
destroy_value
is related to this (in particular if it is still needed).Some guidance about what is the best way to use TLS variables in a robust way (in particular in drop implementations) would be very appreciated.
From the reference manual I infer that
state
should always be checked if the function can be called from a site in which the TLS destructor has already run.At least in library code this might become very common, so I guess we might either have an idiomatic way to do it (and provide an example in the docs) or even have a
safe_with
method which provides anOption
, so that the not-available (already destroyed/being destroyed) case always needs to be explicitly handled.