-
Notifications
You must be signed in to change notification settings - Fork 14.6k
Revert "[mlir] Fix DistinctAttributeUniquer deleting attribute storage when crash reproduction is enabled" #133000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "[mlir] Fix DistinctAttributeUniquer deleting attribute storage when crash reproduction is enabled" #133000
Conversation
…e when c…" This reverts commit 0aa5ba4.
@llvm/pr-subscribers-mlir-llvm @llvm/pr-subscribers-mlir-core Author: Emilio Cota (cota) ChangesReverts llvm/llvm-project#128566. See as well the discussion in llvm/llvm-project#132935. Full diff: https://github.com/llvm/llvm-project/pull/133000.diff 6 Files Affected:
diff --git a/mlir/include/mlir/IR/MLIRContext.h b/mlir/include/mlir/IR/MLIRContext.h
index ad89d631c8a5f..ef8dab87f131a 100644
--- a/mlir/include/mlir/IR/MLIRContext.h
+++ b/mlir/include/mlir/IR/MLIRContext.h
@@ -153,14 +153,6 @@ class MLIRContext {
disableMultithreading(!enable);
}
- /// Set the flag specifying if thread-local storage should be used by storage
- /// allocators in this context. Note that disabling mutlithreading implies
- /// thread-local storage is also disabled.
- void disableThreadLocalStorage(bool disable = true);
- void enableThreadLocalStorage(bool enable = true) {
- disableThreadLocalStorage(!enable);
- }
-
/// Set a new thread pool to be used in this context. This method requires
/// that multithreading is disabled for this context prior to the call. This
/// allows to share a thread pool across multiple contexts, as well as
diff --git a/mlir/lib/IR/AttributeDetail.h b/mlir/lib/IR/AttributeDetail.h
index 8fed18140433c..26d40ac3a38f6 100644
--- a/mlir/lib/IR/AttributeDetail.h
+++ b/mlir/lib/IR/AttributeDetail.h
@@ -24,7 +24,6 @@
#include "llvm/ADT/APFloat.h"
#include "llvm/ADT/PointerIntPair.h"
#include "llvm/Support/TrailingObjects.h"
-#include <mutex>
namespace mlir {
namespace detail {
@@ -402,8 +401,7 @@ class DistinctAttributeUniquer {
/// is freed after the destruction of the distinct attribute allocator.
class DistinctAttributeAllocator {
public:
- DistinctAttributeAllocator(bool threadingIsEnabled)
- : threadingIsEnabled(threadingIsEnabled), useThreadLocalAllocator(true) {};
+ DistinctAttributeAllocator() = default;
DistinctAttributeAllocator(DistinctAttributeAllocator &&) = delete;
DistinctAttributeAllocator(const DistinctAttributeAllocator &) = delete;
@@ -413,49 +411,12 @@ class DistinctAttributeAllocator {
/// Allocates a distinct attribute storage using a thread local bump pointer
/// allocator to enable synchronization free parallel allocations.
DistinctAttrStorage *allocate(Attribute referencedAttr) {
- if (!useThreadLocalAllocator && threadingIsEnabled) {
- std::scoped_lock<std::mutex> lock(allocatorMutex);
- return allocateImpl(referencedAttr);
- }
- return allocateImpl(referencedAttr);
- }
-
- /// Sets a flag that stores if multithreading is enabled. The flag is used to
- /// decide if locking is needed when using a non thread-safe allocator.
- void disableMultiThreading(bool disable = true) {
- threadingIsEnabled = !disable;
- }
-
- /// Sets a flag to disable using thread local bump pointer allocators and use
- /// a single thread-safe allocator. Use this to persist allocated storage
- /// beyond the lifetime of a child thread calling this function while ensuring
- /// thread-safe allocation.
- void disableThreadLocalStorage(bool disable = true) {
- useThreadLocalAllocator = !disable;
- }
-
-private:
- DistinctAttrStorage *allocateImpl(Attribute referencedAttr) {
- return new (getAllocatorInUse().Allocate<DistinctAttrStorage>())
+ return new (allocatorCache.get().Allocate<DistinctAttrStorage>())
DistinctAttrStorage(referencedAttr);
}
- /// If threading is disabled on the owning MLIR context, a normal non
- /// thread-local, non-thread safe bump pointer allocator is used instead to
- /// prevent use-after-free errors whenever attribute storage created on a
- /// crash recover thread is accessed after the thread joins.
- llvm::BumpPtrAllocator &getAllocatorInUse() {
- if (useThreadLocalAllocator)
- return allocatorCache.get();
- return allocator;
- }
-
+private:
ThreadLocalCache<llvm::BumpPtrAllocator> allocatorCache;
- llvm::BumpPtrAllocator allocator;
- std::mutex allocatorMutex;
-
- bool threadingIsEnabled : 1;
- bool useThreadLocalAllocator : 1;
};
} // namespace detail
} // namespace mlir
diff --git a/mlir/lib/IR/MLIRContext.cpp b/mlir/lib/IR/MLIRContext.cpp
index ab6a5ac8b76e8..87782e84dd6e4 100644
--- a/mlir/lib/IR/MLIRContext.cpp
+++ b/mlir/lib/IR/MLIRContext.cpp
@@ -268,8 +268,7 @@ class MLIRContextImpl {
public:
MLIRContextImpl(bool threadingIsEnabled)
- : threadingIsEnabled(threadingIsEnabled),
- distinctAttributeAllocator(threadingIsEnabled) {
+ : threadingIsEnabled(threadingIsEnabled) {
if (threadingIsEnabled) {
ownedThreadPool = std::make_unique<llvm::DefaultThreadPool>();
threadPool = ownedThreadPool.get();
@@ -597,7 +596,6 @@ void MLIRContext::disableMultithreading(bool disable) {
// Update the threading mode for each of the uniquers.
impl->affineUniquer.disableMultithreading(disable);
impl->attributeUniquer.disableMultithreading(disable);
- impl->distinctAttributeAllocator.disableMultiThreading(disable);
impl->typeUniquer.disableMultithreading(disable);
// Destroy thread pool (stop all threads) if it is no longer needed, or create
@@ -719,10 +717,6 @@ bool MLIRContext::isOperationRegistered(StringRef name) {
return RegisteredOperationName::lookup(name, this).has_value();
}
-void MLIRContext::disableThreadLocalStorage(bool disable) {
- getImpl().distinctAttributeAllocator.disableThreadLocalStorage(disable);
-}
-
void Dialect::addType(TypeID typeID, AbstractType &&typeInfo) {
auto &impl = context->getImpl();
assert(impl.multiThreadedExecutionContext == 0 &&
diff --git a/mlir/lib/Pass/PassCrashRecovery.cpp b/mlir/lib/Pass/PassCrashRecovery.cpp
index 7ea2a57693217..8c6d865cb31dd 100644
--- a/mlir/lib/Pass/PassCrashRecovery.cpp
+++ b/mlir/lib/Pass/PassCrashRecovery.cpp
@@ -414,15 +414,6 @@ struct FileReproducerStream : public mlir::ReproducerStream {
LogicalResult PassManager::runWithCrashRecovery(Operation *op,
AnalysisManager am) {
- // Notify the context to disable the use of thread-local storage while the
- // pass manager is running in a crash recovery context thread. Re-enable the
- // thread local storage upon function exit. This is required to persist any
- // attribute storage allocated during passes beyond the lifetime of the
- // recovery context thread.
- MLIRContext *ctx = getContext();
- ctx->disableThreadLocalStorage();
- auto guard =
- llvm::make_scope_exit([ctx]() { ctx->enableThreadLocalStorage(); });
crashReproGenerator->initialize(getPasses(), op, verifyPasses);
// Safely invoke the passes within a recovery context.
diff --git a/mlir/test/Dialect/LLVMIR/add-debuginfo-func-scope-with-crash-reproduction.mlir b/mlir/test/Dialect/LLVMIR/add-debuginfo-func-scope-with-crash-reproduction.mlir
deleted file mode 100644
index b36ed367adb76..0000000000000
--- a/mlir/test/Dialect/LLVMIR/add-debuginfo-func-scope-with-crash-reproduction.mlir
+++ /dev/null
@@ -1,22 +0,0 @@
-// Test that the enable-debug-info-scope-on-llvm-func pass can create its
-// distinct attributes when running in the crash reproducer thread.
-
-// RUN: mlir-opt --mlir-disable-threading --mlir-pass-pipeline-crash-reproducer=. \
-// RUN: --pass-pipeline="builtin.module(ensure-debug-info-scope-on-llvm-func)" \
-// RUN: --mlir-print-debuginfo %s | FileCheck %s
-
-// RUN: mlir-opt --mlir-pass-pipeline-crash-reproducer=. \
-// RUN: --pass-pipeline="builtin.module(ensure-debug-info-scope-on-llvm-func)" \
-// RUN: --mlir-print-debuginfo %s | FileCheck %s
-
-module {
- llvm.func @func_no_debug() {
- llvm.return loc(unknown)
- } loc(unknown)
-} loc(unknown)
-
-// CHECK-LABEL: llvm.func @func_no_debug()
-// CHECK: llvm.return loc(#loc
-// CHECK: loc(#loc[[LOC:[0-9]+]])
-// CHECK: #di_compile_unit = #llvm.di_compile_unit<id = distinct[{{.*}}]<>,
-// CHECK: #di_subprogram = #llvm.di_subprogram<id = distinct[{{.*}}]<>
diff --git a/mlir/test/IR/test-builtin-distinct-attrs-with-crash-reproduction.mlir b/mlir/test/IR/test-builtin-distinct-attrs-with-crash-reproduction.mlir
deleted file mode 100644
index af6dead31e263..0000000000000
--- a/mlir/test/IR/test-builtin-distinct-attrs-with-crash-reproduction.mlir
+++ /dev/null
@@ -1,18 +0,0 @@
-// This test verifies that when running with crash reproduction enabled, distinct
-// attribute storage is not allocated in thread-local storage. Since crash
-// reproduction runs the pass manager in a separate thread, using thread-local
-// storage for distinct attributes causes use-after-free errors once the thread
-// that runs the pass manager joins.
-
-// RUN: mlir-opt --mlir-disable-threading --mlir-pass-pipeline-crash-reproducer=. %s -test-distinct-attrs | FileCheck %s
-// RUN: mlir-opt --mlir-pass-pipeline-crash-reproducer=. %s -test-distinct-attrs | FileCheck %s
-
-// CHECK: #[[DIST0:.*]] = distinct[0]<42 : i32>
-// CHECK: #[[DIST1:.*]] = distinct[1]<42 : i32>
-#distinct = distinct[0]<42 : i32>
-
-// CHECK: @foo_1
-func.func @foo_1() {
- // CHECK: "test.op"() {distinct.input = #[[DIST0]], distinct.output = #[[DIST1]]}
- "test.op"() {distinct.input = #distinct} : () -> ()
-}
|
@llvm/pr-subscribers-mlir Author: Emilio Cota (cota) ChangesReverts llvm/llvm-project#128566. See as well the discussion in llvm/llvm-project#132935. Full diff: https://github.com/llvm/llvm-project/pull/133000.diff 6 Files Affected:
diff --git a/mlir/include/mlir/IR/MLIRContext.h b/mlir/include/mlir/IR/MLIRContext.h
index ad89d631c8a5f..ef8dab87f131a 100644
--- a/mlir/include/mlir/IR/MLIRContext.h
+++ b/mlir/include/mlir/IR/MLIRContext.h
@@ -153,14 +153,6 @@ class MLIRContext {
disableMultithreading(!enable);
}
- /// Set the flag specifying if thread-local storage should be used by storage
- /// allocators in this context. Note that disabling mutlithreading implies
- /// thread-local storage is also disabled.
- void disableThreadLocalStorage(bool disable = true);
- void enableThreadLocalStorage(bool enable = true) {
- disableThreadLocalStorage(!enable);
- }
-
/// Set a new thread pool to be used in this context. This method requires
/// that multithreading is disabled for this context prior to the call. This
/// allows to share a thread pool across multiple contexts, as well as
diff --git a/mlir/lib/IR/AttributeDetail.h b/mlir/lib/IR/AttributeDetail.h
index 8fed18140433c..26d40ac3a38f6 100644
--- a/mlir/lib/IR/AttributeDetail.h
+++ b/mlir/lib/IR/AttributeDetail.h
@@ -24,7 +24,6 @@
#include "llvm/ADT/APFloat.h"
#include "llvm/ADT/PointerIntPair.h"
#include "llvm/Support/TrailingObjects.h"
-#include <mutex>
namespace mlir {
namespace detail {
@@ -402,8 +401,7 @@ class DistinctAttributeUniquer {
/// is freed after the destruction of the distinct attribute allocator.
class DistinctAttributeAllocator {
public:
- DistinctAttributeAllocator(bool threadingIsEnabled)
- : threadingIsEnabled(threadingIsEnabled), useThreadLocalAllocator(true) {};
+ DistinctAttributeAllocator() = default;
DistinctAttributeAllocator(DistinctAttributeAllocator &&) = delete;
DistinctAttributeAllocator(const DistinctAttributeAllocator &) = delete;
@@ -413,49 +411,12 @@ class DistinctAttributeAllocator {
/// Allocates a distinct attribute storage using a thread local bump pointer
/// allocator to enable synchronization free parallel allocations.
DistinctAttrStorage *allocate(Attribute referencedAttr) {
- if (!useThreadLocalAllocator && threadingIsEnabled) {
- std::scoped_lock<std::mutex> lock(allocatorMutex);
- return allocateImpl(referencedAttr);
- }
- return allocateImpl(referencedAttr);
- }
-
- /// Sets a flag that stores if multithreading is enabled. The flag is used to
- /// decide if locking is needed when using a non thread-safe allocator.
- void disableMultiThreading(bool disable = true) {
- threadingIsEnabled = !disable;
- }
-
- /// Sets a flag to disable using thread local bump pointer allocators and use
- /// a single thread-safe allocator. Use this to persist allocated storage
- /// beyond the lifetime of a child thread calling this function while ensuring
- /// thread-safe allocation.
- void disableThreadLocalStorage(bool disable = true) {
- useThreadLocalAllocator = !disable;
- }
-
-private:
- DistinctAttrStorage *allocateImpl(Attribute referencedAttr) {
- return new (getAllocatorInUse().Allocate<DistinctAttrStorage>())
+ return new (allocatorCache.get().Allocate<DistinctAttrStorage>())
DistinctAttrStorage(referencedAttr);
}
- /// If threading is disabled on the owning MLIR context, a normal non
- /// thread-local, non-thread safe bump pointer allocator is used instead to
- /// prevent use-after-free errors whenever attribute storage created on a
- /// crash recover thread is accessed after the thread joins.
- llvm::BumpPtrAllocator &getAllocatorInUse() {
- if (useThreadLocalAllocator)
- return allocatorCache.get();
- return allocator;
- }
-
+private:
ThreadLocalCache<llvm::BumpPtrAllocator> allocatorCache;
- llvm::BumpPtrAllocator allocator;
- std::mutex allocatorMutex;
-
- bool threadingIsEnabled : 1;
- bool useThreadLocalAllocator : 1;
};
} // namespace detail
} // namespace mlir
diff --git a/mlir/lib/IR/MLIRContext.cpp b/mlir/lib/IR/MLIRContext.cpp
index ab6a5ac8b76e8..87782e84dd6e4 100644
--- a/mlir/lib/IR/MLIRContext.cpp
+++ b/mlir/lib/IR/MLIRContext.cpp
@@ -268,8 +268,7 @@ class MLIRContextImpl {
public:
MLIRContextImpl(bool threadingIsEnabled)
- : threadingIsEnabled(threadingIsEnabled),
- distinctAttributeAllocator(threadingIsEnabled) {
+ : threadingIsEnabled(threadingIsEnabled) {
if (threadingIsEnabled) {
ownedThreadPool = std::make_unique<llvm::DefaultThreadPool>();
threadPool = ownedThreadPool.get();
@@ -597,7 +596,6 @@ void MLIRContext::disableMultithreading(bool disable) {
// Update the threading mode for each of the uniquers.
impl->affineUniquer.disableMultithreading(disable);
impl->attributeUniquer.disableMultithreading(disable);
- impl->distinctAttributeAllocator.disableMultiThreading(disable);
impl->typeUniquer.disableMultithreading(disable);
// Destroy thread pool (stop all threads) if it is no longer needed, or create
@@ -719,10 +717,6 @@ bool MLIRContext::isOperationRegistered(StringRef name) {
return RegisteredOperationName::lookup(name, this).has_value();
}
-void MLIRContext::disableThreadLocalStorage(bool disable) {
- getImpl().distinctAttributeAllocator.disableThreadLocalStorage(disable);
-}
-
void Dialect::addType(TypeID typeID, AbstractType &&typeInfo) {
auto &impl = context->getImpl();
assert(impl.multiThreadedExecutionContext == 0 &&
diff --git a/mlir/lib/Pass/PassCrashRecovery.cpp b/mlir/lib/Pass/PassCrashRecovery.cpp
index 7ea2a57693217..8c6d865cb31dd 100644
--- a/mlir/lib/Pass/PassCrashRecovery.cpp
+++ b/mlir/lib/Pass/PassCrashRecovery.cpp
@@ -414,15 +414,6 @@ struct FileReproducerStream : public mlir::ReproducerStream {
LogicalResult PassManager::runWithCrashRecovery(Operation *op,
AnalysisManager am) {
- // Notify the context to disable the use of thread-local storage while the
- // pass manager is running in a crash recovery context thread. Re-enable the
- // thread local storage upon function exit. This is required to persist any
- // attribute storage allocated during passes beyond the lifetime of the
- // recovery context thread.
- MLIRContext *ctx = getContext();
- ctx->disableThreadLocalStorage();
- auto guard =
- llvm::make_scope_exit([ctx]() { ctx->enableThreadLocalStorage(); });
crashReproGenerator->initialize(getPasses(), op, verifyPasses);
// Safely invoke the passes within a recovery context.
diff --git a/mlir/test/Dialect/LLVMIR/add-debuginfo-func-scope-with-crash-reproduction.mlir b/mlir/test/Dialect/LLVMIR/add-debuginfo-func-scope-with-crash-reproduction.mlir
deleted file mode 100644
index b36ed367adb76..0000000000000
--- a/mlir/test/Dialect/LLVMIR/add-debuginfo-func-scope-with-crash-reproduction.mlir
+++ /dev/null
@@ -1,22 +0,0 @@
-// Test that the enable-debug-info-scope-on-llvm-func pass can create its
-// distinct attributes when running in the crash reproducer thread.
-
-// RUN: mlir-opt --mlir-disable-threading --mlir-pass-pipeline-crash-reproducer=. \
-// RUN: --pass-pipeline="builtin.module(ensure-debug-info-scope-on-llvm-func)" \
-// RUN: --mlir-print-debuginfo %s | FileCheck %s
-
-// RUN: mlir-opt --mlir-pass-pipeline-crash-reproducer=. \
-// RUN: --pass-pipeline="builtin.module(ensure-debug-info-scope-on-llvm-func)" \
-// RUN: --mlir-print-debuginfo %s | FileCheck %s
-
-module {
- llvm.func @func_no_debug() {
- llvm.return loc(unknown)
- } loc(unknown)
-} loc(unknown)
-
-// CHECK-LABEL: llvm.func @func_no_debug()
-// CHECK: llvm.return loc(#loc
-// CHECK: loc(#loc[[LOC:[0-9]+]])
-// CHECK: #di_compile_unit = #llvm.di_compile_unit<id = distinct[{{.*}}]<>,
-// CHECK: #di_subprogram = #llvm.di_subprogram<id = distinct[{{.*}}]<>
diff --git a/mlir/test/IR/test-builtin-distinct-attrs-with-crash-reproduction.mlir b/mlir/test/IR/test-builtin-distinct-attrs-with-crash-reproduction.mlir
deleted file mode 100644
index af6dead31e263..0000000000000
--- a/mlir/test/IR/test-builtin-distinct-attrs-with-crash-reproduction.mlir
+++ /dev/null
@@ -1,18 +0,0 @@
-// This test verifies that when running with crash reproduction enabled, distinct
-// attribute storage is not allocated in thread-local storage. Since crash
-// reproduction runs the pass manager in a separate thread, using thread-local
-// storage for distinct attributes causes use-after-free errors once the thread
-// that runs the pass manager joins.
-
-// RUN: mlir-opt --mlir-disable-threading --mlir-pass-pipeline-crash-reproducer=. %s -test-distinct-attrs | FileCheck %s
-// RUN: mlir-opt --mlir-pass-pipeline-crash-reproducer=. %s -test-distinct-attrs | FileCheck %s
-
-// CHECK: #[[DIST0:.*]] = distinct[0]<42 : i32>
-// CHECK: #[[DIST1:.*]] = distinct[1]<42 : i32>
-#distinct = distinct[0]<42 : i32>
-
-// CHECK: @foo_1
-func.func @foo_1() {
- // CHECK: "test.op"() {distinct.input = #[[DIST0]], distinct.output = #[[DIST1]]}
- "test.op"() {distinct.input = #distinct} : () -> ()
-}
|
This PR fixes a use-after-free error that happens when `DistinctAttr` instances are created within a `PassManager` running with crash recovery enabled. The root cause is that `DistinctAttr` storage is allocated in a thread_local allocator, which is destroyed when the crash recovery thread joins, invalidating the storage. Moreover, even without crash reproduction disabling multithreading on the context will destroy the context's thread pool, and in turn delete the threadlocal storage. This means a call to `ctx->disableMulthithreading()` breaks the IR. This PR replaces the thread local allocator with a synchronised allocator that's shared between threads. This persists the lifetime of allocated DistinctAttr storage instances to the lifetime of the context. ### Problem Details: The `DistinctAttributeAllocator` uses a `ThreadLocalCache<BumpPtrAllocator>` for lock-free allocation of `DistinctAttr` storage in a multithreaded context. The issue occurs when a `PassManager` is run with crash recovery (`runWithCrashRecovery`), the pass pipeline is executed on a temporary thread spawned by `llvm::CrashRecoveryContext`. Any `DistinctAttr`s created during this execution have their storage allocated in the thread_local cache of this temporary thread. When the thread joins, the thread_local storage is destroyed, freeing the `DistinctAttr`s' memory. If this attribute is accessed later, e.g. when printing, it results in a use-after-free. As mentioned previously, this is also seen after creating some `DistinctAttr`s and then calling `ctx->disableMulthithreading()`. ### Solution `DistinctAttrStorageAllocator` uses a synchronised, shared allocator instead of one wrapped in a `ThreadLocalCache`. The former is what stores the allocator in transient thread_local storage. ### Testing: A C++ unit test has been added to validate this fix. (I was previously reproducing this failure with `mlir-opt` but I can no longer do so and I am unsure why.) ----- Note: This is a 2nd attempt at my previous PR #128566 that was reverted in #133000. I believe I've addressed the TSAN and race condition concerns.
…rage (#148666) This PR fixes a use-after-free error that happens when `DistinctAttr` instances are created within a `PassManager` running with crash recovery enabled. The root cause is that `DistinctAttr` storage is allocated in a thread_local allocator, which is destroyed when the crash recovery thread joins, invalidating the storage. Moreover, even without crash reproduction disabling multithreading on the context will destroy the context's thread pool, and in turn delete the threadlocal storage. This means a call to `ctx->disableMulthithreading()` breaks the IR. This PR replaces the thread local allocator with a synchronised allocator that's shared between threads. This persists the lifetime of allocated DistinctAttr storage instances to the lifetime of the context. ### Problem Details: The `DistinctAttributeAllocator` uses a `ThreadLocalCache<BumpPtrAllocator>` for lock-free allocation of `DistinctAttr` storage in a multithreaded context. The issue occurs when a `PassManager` is run with crash recovery (`runWithCrashRecovery`), the pass pipeline is executed on a temporary thread spawned by `llvm::CrashRecoveryContext`. Any `DistinctAttr`s created during this execution have their storage allocated in the thread_local cache of this temporary thread. When the thread joins, the thread_local storage is destroyed, freeing the `DistinctAttr`s' memory. If this attribute is accessed later, e.g. when printing, it results in a use-after-free. As mentioned previously, this is also seen after creating some `DistinctAttr`s and then calling `ctx->disableMulthithreading()`. ### Solution `DistinctAttrStorageAllocator` uses a synchronised, shared allocator instead of one wrapped in a `ThreadLocalCache`. The former is what stores the allocator in transient thread_local storage. ### Testing: A C++ unit test has been added to validate this fix. (I was previously reproducing this failure with `mlir-opt` but I can no longer do so and I am unsure why.) ----- Note: This is a 2nd attempt at my previous PR llvm/llvm-project#128566 that was reverted in llvm/llvm-project#133000. I believe I've addressed the TSAN and race condition concerns.
Reverts #128566. See as well the discussion in #132935.