Skip to content

[ASan] Prevent ASan/LSan deadlock by preloading modules before error reporting #131756

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 16, 2025

Conversation

Camsyn
Copy link
Contributor

@Camsyn Camsyn commented Mar 18, 2025

Description

This PR resolves a deadlock between AddressSanitizer (ASan) and LeakSanitizer (LSan)
that occurs when both sanitizers attempt to acquire locks in conflicting orders across
threads. The fix ensures safe lock acquisition ordering by preloading module information
before error reporting.


Issue Details

Reproducer

// Thread 1: ASan error path 
int arr[1] = {0};
std::thread t([&]() {  
  arr[1] = 1; // Triggers ASan OOB error  
});  

// Thread 2: LSan check path  
__lsan_do_leak_check();   

Lock Order Conflict:

  • Thread 1 (ASan error reporting):

    1. Acquires ASan thread registry lock (B)
    2. Attempts to acquire libdl lock (A) via dl_iterate_phdr
  • Thread 2 (LSan leak check):

    1. Acquires libdl lock (A) via dl_iterate_phdr
    2. Attempts to acquire ASan thread registry lock (B)

This creates a circular wait condition (A -> B -> A) meeting all four Coffman deadlock criteria.


Fix Strategy

The root cause lies in ASan's error reporting path needing dl_iterate_phdr (requiring lock A)
while already holding its thread registry lock (B). The solution:

  1. Preload Modules Early: Force module list initialization before acquiring ASan's thread lock
  2. Avoid Nested Locking: Ensure symbolization (via dl_iterate_phdr) completes before error reporting locks

Key code change:

// Before acquiring ASan's thread registry lock:  
Symbolizer::GetOrInit()->GetRefreshedListOfModules();  

This guarantees module information is cached before lock acquisition, eliminating
the need for dl_iterate_phdr calls during error reporting.


Testing

Added asan_lsan_deadlock.cpp test case:

  • Reproduces deadlock reliably without fix under idle system conditions
  • Uses watchdog thread to detect hangs
  • Verifies ASan error reports correctly without deadlock

Note: Due to the inherent non-determinism of thread scheduling and lock acquisition timing,
this test may not reliably reproduce the deadlock on busy systems (e.g., during parallel
ninja check-asan runs).


Impact

  • Fixes rare but severe deadlocks in mixed ASan+LSan environments
  • Maintains thread safety guarantees for both sanitizers
  • No user-visible behavior changes except deadlock elimination

Relevant Buggy Code

  • Code in ASan's asan_report.cpp
  explicit ScopedInErrorReport(bool fatal = false)
      : halt_on_error_(fatal || flags()->halt_on_error) {
    // Acquire lock B
    asanThreadRegistry().Lock();
  }
  ~ScopedInErrorReport() {
    ...
    // Try to acquire lock A under holding lock B via the following path
    // #4  0x000071a353d83e93 in __GI___dl_iterate_phdr (
    //     callback=0x5d1a07a39580 <__sanitizer::dl_iterate_phdr_cb(dl_phdr_info*, unsigned long, void*)>, 
    //     data=0x6da3510fd3f0) at ./elf/dl-iteratephdr.c:39
    // #5  0x00005d1a07a39574 in __sanitizer::ListOfModules::init (this=0x71a353ebc080)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp:784
    // #6  0x00005d1a07a429e3 in __sanitizer::Symbolizer::RefreshModules (this=0x71a353ebc058)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp:188
    // #7  __sanitizer::Symbolizer::FindModuleForAddress (this=this@entry=0x71a353ebc058, 
    //     address=address@entry=102366378805727)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp:214
    // #8  0x00005d1a07a4291b in __sanitizer::Symbolizer::SymbolizePC (this=0x71a353ebc058, addr=102366378805727)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp:88
    // #9  0x00005d1a07a40df7 in __sanitizer::(anonymous namespace)::StackTraceTextPrinter::ProcessAddressFrames (
    //     this=this@entry=0x6da3510fd520, pc=102366378805727)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cpp:37
    // #10 0x00005d1a07a40d27 in __sanitizer::StackTrace::PrintTo (this=this@entry=0x6da3510fd5e8, 
    //     output=output@entry=0x6da3510fd588)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cpp:110
    // #11 0x00005d1a07a410a1 in __sanitizer::StackTrace::Print (this=0x6da3510fd5e8)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cpp:133
    // #12 0x00005d1a0798758d in __asan::ErrorGeneric::Print (
    //     this=0x5d1a07aa4e08 <__asan::ScopedInErrorReport::current_error_+8>)
    //     at llvm-project/compiler-rt/lib/asan/asan_errors.cpp:617    
    current_error_.Print();
    ... 
  }
  • Code in LSan's lsan_common_linux.cpp
void LockStuffAndStopTheWorld(StopTheWorldCallback callback,
                              CheckForLeaksParam *argument) {
  // Acquire lock A
  dl_iterate_phdr(LockStuffAndStopTheWorldCallback, &param);
}

static int LockStuffAndStopTheWorldCallback(struct dl_phdr_info *info,
                                            size_t size, void *data) {
  // Try to acquire lock B under holding lock A via the following path
  // #3  0x000055555562b34a in __sanitizer::ThreadRegistry::Lock (this=<optimized out>)
  //     at llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_thread_registry.h:99
  // #4  __lsan::LockThreads () at llvm-project/compiler-rt/lib/asan/asan_thread.cpp:484
  // #5  0x0000555555652629 in __lsan::ScopedStopTheWorldLock::ScopedStopTheWorldLock (this=<optimized out>)
  //     at llvm-project/compiler-rt/lib/lsan/lsan_common.h:164
  // #6  __lsan::LockStuffAndStopTheWorldCallback (info=<optimized out>, size=<optimized out>, data=0x0, 
  //     data@entry=0x7fffffffd158) at llvm-project/compiler-rt/lib/lsan/lsan_common_linux.cpp:120
  ScopedStopTheWorldLock lock;
  DoStopTheWorldParam *param = reinterpret_cast<DoStopTheWorldParam *>(data);
  StopTheWorld(param->callback, param->argument);
  return 1;
}

…reporting


This fixes a circular lock dependency between ASan's thread registry 
lock (B) and libdl's dl_iterate_phdr lock (A) that occurs when:

1. ASan error reporting (holding B) calls dl_iterate_phdr (needs A)
2. LSan leak check (holding A) acquires ASan thread registry lock (B)

The fix proactively initializes module information before acquiring 
ASan's thread registry lock during error reporting. This eliminates 
the need to call `dl_iterate_phdr` while already holding lock B, 
breaking the dangerous A→B/B→A cross-thread locking pattern.

Add test case (compiler-rt/test/asan/TestCases/asan_lsan_deadlock.cpp)
that reproduces the deadlock without this patch.
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Mar 18, 2025

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: None (Camsyn)

Changes

Description

This PR resolves a deadlock between AddressSanitizer (ASan) and LeakSanitizer (LSan)
that occurs when both sanitizers attempt to acquire locks in conflicting orders across
threads. The fix ensures safe lock acquisition ordering by preloading module information
before error reporting.


Issue Details

Reproducer

// Thread 1: ASan error path 
int arr[1] = {0};
std::thread t([&amp;]() {  
  arr[1] = 1; // Triggers ASan OOB error  
});  

// Thread 2: LSan check path  
__lsan_do_leak_check();   

Lock Order Conflict:

  • Thread 1 (ASan error reporting):

    1. Acquires ASan thread registry lock (B)
    2. Attempts to acquire libdl lock (A) via dl_iterate_phdr
  • Thread 2 (LSan leak check):

    1. Acquires libdl lock (A) via dl_iterate_phdr
    2. Attempts to acquire ASan thread registry lock (B)

This creates a circular wait condition (A -> B -> A) meeting all four Coffman deadlock criteria.


Fix Strategy

The root cause lies in ASan's error reporting path needing dl_iterate_phdr (requiring lock A)
while already holding its thread registry lock (B). The solution:

  1. Preload Modules Early: Force module list initialization before acquiring ASan's thread lock
  2. Avoid Nested Locking: Ensure symbolization (via dl_iterate_phdr) completes before error reporting locks

Key code change:

// Before acquiring ASan's thread registry lock:  
Symbolizer::GetOrInit()-&gt;GetRefreshedListOfModules();  

This guarantees module information is cached before lock acquisition, eliminating
the need for dl_iterate_phdr calls during error reporting.


Testing

Added asan_lsan_deadlock.cpp test case:

  • Reproduces deadlock reliably without fix under idle system conditions
  • Uses watchdog thread to detect hangs
  • Verifies ASan error reports correctly without deadlock

Note: Due to the inherent non-determinism of thread scheduling and lock acquisition timing,
this test may not reliably reproduce the deadlock on busy systems (e.g., during parallel
ninja check-asan runs).


Impact

  • Fixes rare but severe deadlocks in mixed ASan+LSan environments
  • Maintains thread safety guarantees for both sanitizers
  • No user-visible behavior changes except deadlock elimination

Relevant Buggy Code

  • Code in ASan's asan_report.cpp
  explicit ScopedInErrorReport(bool fatal = false)
      : halt_on_error_(fatal || flags()-&gt;halt_on_error) {
    // Acquire lock B
    asanThreadRegistry().Lock();
  }
  ~ScopedInErrorReport() {
    ...
    // Try to acquire lock A under holding lock B via the following path
    // #<!-- -->4  0x000071a353d83e93 in __GI___dl_iterate_phdr (
    //     callback=0x5d1a07a39580 &lt;__sanitizer::dl_iterate_phdr_cb(dl_phdr_info*, unsigned long, void*)&gt;, 
    //     data=0x6da3510fd3f0) at ./elf/dl-iteratephdr.c:39
    // #<!-- -->5  0x00005d1a07a39574 in __sanitizer::ListOfModules::init (this=0x71a353ebc080)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp:784
    // #<!-- -->6  0x00005d1a07a429e3 in __sanitizer::Symbolizer::RefreshModules (this=0x71a353ebc058)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp:188
    // #<!-- -->7  __sanitizer::Symbolizer::FindModuleForAddress (this=this@<!-- -->entry=0x71a353ebc058, 
    //     address=address@<!-- -->entry=102366378805727)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp:214
    // #<!-- -->8  0x00005d1a07a4291b in __sanitizer::Symbolizer::SymbolizePC (this=0x71a353ebc058, addr=102366378805727)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp:88
    // #<!-- -->9  0x00005d1a07a40df7 in __sanitizer::(anonymous namespace)::StackTraceTextPrinter::ProcessAddressFrames (
    //     this=this@<!-- -->entry=0x6da3510fd520, pc=102366378805727)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cpp:37
    // #<!-- -->10 0x00005d1a07a40d27 in __sanitizer::StackTrace::PrintTo (this=this@<!-- -->entry=0x6da3510fd5e8, 
    //     output=output@<!-- -->entry=0x6da3510fd588)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cpp:110
    // #<!-- -->11 0x00005d1a07a410a1 in __sanitizer::StackTrace::Print (this=0x6da3510fd5e8)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cpp:133
    // #<!-- -->12 0x00005d1a0798758d in __asan::ErrorGeneric::Print (
    //     this=0x5d1a07aa4e08 &lt;__asan::ScopedInErrorReport::current_error_+8&gt;)
    //     at llvm-project/compiler-rt/lib/asan/asan_errors.cpp:617    
    current_error_.Print();
    ... 
  }
  • Code in LSan's lsan_common_linux.cpp
void LockStuffAndStopTheWorld(StopTheWorldCallback callback,
                              CheckForLeaksParam *argument) {
  // Acquire lock A
  dl_iterate_phdr(LockStuffAndStopTheWorldCallback, &amp;param);
}

static int LockStuffAndStopTheWorldCallback(struct dl_phdr_info *info,
                                            size_t size, void *data) {
  // Try to acquire lock B under holding lock A via the following path
  // #<!-- -->3  0x000055555562b34a in __sanitizer::ThreadRegistry::Lock (this=&lt;optimized out&gt;)
  //     at llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_thread_registry.h:99
  // #<!-- -->4  __lsan::LockThreads () at llvm-project/compiler-rt/lib/asan/asan_thread.cpp:484
  // #<!-- -->5  0x0000555555652629 in __lsan::ScopedStopTheWorldLock::ScopedStopTheWorldLock (this=&lt;optimized out&gt;)
  //     at llvm-project/compiler-rt/lib/lsan/lsan_common.h:164
  // #<!-- -->6  __lsan::LockStuffAndStopTheWorldCallback (info=&lt;optimized out&gt;, size=&lt;optimized out&gt;, data=0x0, 
  //     data@<!-- -->entry=0x7fffffffd158) at llvm-project/compiler-rt/lib/lsan/lsan_common_linux.cpp:120
  ScopedStopTheWorldLock lock;
  DoStopTheWorldParam *param = reinterpret_cast&lt;DoStopTheWorldParam *&gt;(data);
  StopTheWorld(param-&gt;callback, param-&gt;argument);
  return 1;
}

Full diff: https://github.com/llvm/llvm-project/pull/131756.diff

2 Files Affected:

  • (modified) compiler-rt/lib/asan/asan_report.cpp (+21)
  • (added) compiler-rt/test/asan/TestCases/asan_lsan_deadlock.cpp (+73)
diff --git a/compiler-rt/lib/asan/asan_report.cpp b/compiler-rt/lib/asan/asan_report.cpp
index 45aa607dcda07..8aeb938b2ee3d 100644
--- a/compiler-rt/lib/asan/asan_report.cpp
+++ b/compiler-rt/lib/asan/asan_report.cpp
@@ -126,6 +126,27 @@ class ScopedInErrorReport {
  public:
   explicit ScopedInErrorReport(bool fatal = false)
       : halt_on_error_(fatal || flags()->halt_on_error) {
+   /*
+    * Deadlock Prevention Between ASan and LSan
+    *
+    * Background:
+    * - The `dl_iterate_phdr` function requires holding libdl's internal lock (Lock A).
+    * - LSan acquires the ASan thread registry lock (Lock B) *after* calling `dl_iterate_phdr`.
+    *
+    * Problem Scenario:
+    * When ASan attempts to call `dl_iterate_phdr` while holding Lock B (e.g., during
+    * error reporting via `ErrorDescription::Print`), a circular lock dependency may occur:
+    *   1. Thread 1: Holds Lock B → Requests Lock A (via dl_iterate_phdr)
+    *   2. Thread 2: Holds Lock A → Requests Lock B (via LSan operations)
+    *
+    * Solution:
+    * Proactively load all required modules before acquiring Lock B. This ensures:
+    * 1. Any `dl_iterate_phdr` calls during module loading complete before locking
+    * 2. Subsequent error reporting avoids nested lock acquisition patterns
+    * 3. Eliminates the lock order inversion risk between libdl and ASan's thread registry
+    */
+    Symbolizer::GetOrInit()->GetRefreshedListOfModules();
+
     // Make sure the registry and sanitizer report mutexes are locked while
     // we're printing an error report.
     // We can lock them only here to avoid self-deadlock in case of
diff --git a/compiler-rt/test/asan/TestCases/asan_lsan_deadlock.cpp b/compiler-rt/test/asan/TestCases/asan_lsan_deadlock.cpp
new file mode 100644
index 0000000000000..6d4c91eb49241
--- /dev/null
+++ b/compiler-rt/test/asan/TestCases/asan_lsan_deadlock.cpp
@@ -0,0 +1,73 @@
+// Test for potential deadlock in LeakSanitizer+AddressSanitizer.
+// REQUIRES: leak-detection
+//
+// RUN: %clangxx_asan -O0 %s -o %t
+// RUN: %env_asan_opts=detect_leaks=1 not %run %t 2>&1 | FileCheck %s
+
+/*
+ * Purpose: Verify deadlock prevention between ASan error reporting and LSan leak checking.
+ * 
+ * Test Design:
+ * 1. Creates contention scenario between:
+ *    - ASan's error reporting (requires lock B -> lock A ordering)
+ *    - LSan's leak check (requires lock A -> lock B ordering)
+ * 2. Thread timing:
+ *    - Main thread: Holds 'in' mutex -> Triggers LSan check (lock A then B) 
+ *    - Worker thread: Triggers ASan OOB error (lock B then A via symbolization)
+ * 
+ * Deadlock Condition (if unfixed):
+ * Circular lock dependency forms when:
+ * [Main Thread] LSan: lock A -> requests lock B
+ * [Worker Thread] ASan: lock B -> requests lock A
+ * 
+ * Success Criteria: 
+ * With proper lock ordering enforcement, watchdog should NOT trigger - test exits normally.
+ * If deadlock occurs, watchdog terminates via _exit(1) after 10s timeout.
+ */
+
+#include <mutex>
+#include <sanitizer/lsan_interface.h>
+#include <stdio.h>
+#include <thread>
+#include <unistd.h>
+
+std::mutex in;
+
+void Watchdog() {
+  // Safety mechanism: Turn infinite deadlock into finite test failure
+  usleep(10000000);
+  // CHECK-NOT: Timeout! Deadlock detected.
+  puts("Timeout! Deadlock detected.");
+  fflush(stdout);
+  _exit(1);
+}
+
+int main(int argc, char **argv) {
+  int arr[1] = {0};
+  in.lock();
+
+  std::thread w(Watchdog);
+  w.detach();
+
+  std::thread t([&]() {
+    in.unlock();
+    /* 
+     * Provoke ASan error: ASan's error reporting acquires: 
+     * 1. ASan's thread registry lock (B) during the reporting 
+     * 2. dl_iterate_phdr lock (A) during symbolization
+     */
+    // CHECK: SUMMARY: AddressSanitizer: stack-buffer-overflow
+    arr[argc] = 1; // Deliberate OOB access
+  });
+
+  in.lock();
+  /* 
+   * Critical section: LSan's check acquires: 
+   * 1. dl_iterate_phdr lock (A)
+   * 2. ASan's thread registry lock (B)
+   * before Stop The World.
+   */
+  __lsan_do_leak_check();
+  t.join();
+  return 0;
+}
\ No newline at end of file

@dtcxzyw dtcxzyw requested a review from vitalybuka March 20, 2025 08:58
Copy link

⚠️ We detected that you are using a GitHub private e-mail address to contribute to the repo.
Please turn off Keep my email addresses private setting in your account.
See LLVM Discourse for more information.

Copy link

github-actions bot commented Mar 20, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@dtcxzyw dtcxzyw requested a review from fmayer March 20, 2025 09:00
Copy link
Collaborator

@vitalybuka vitalybuka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch!
On a first look LGTM with nits.

Please also make clang format happy.

I'll take a better look later.

@Camsyn
Copy link
Contributor Author

Camsyn commented Apr 2, 2025

Proposed Fixes for ASan-LSan Deadlock Issue

After analyzing the ASan/LSan code and commit history, I confirm that concurrent error reporting from different threads should not be considered (it has already been handled by ScopedErrorReportLock). Below are the potential solutions (to my knowledge) :


1. Maintain Current Patch (Incomplete)

  • Approach: Keep the existing patch to preload modules in ASan, avoiding holding Lock A (libdl) under Lock B (ASan thread lock).
  • Risk: If other threads modify the module list via dlopen/dlclose during this window, the patch may regress to the original deadlock scenario. While rare (due to the low probability of overlapping LSan checks and ASan reports + module updates), this is not a complete fix.
    • Supplemental Idea: Add synchronization to make dlopen/dlclose wait until error reports complete before updating module lists.
    • Rationale: Error reports depend on module list, and the "buggy" module triggering the report should logically exist stably before the report. Should the module list be frozen during error reporting?

2. Revert LSan Lock Order (Incomplete)

  • Approach: Move LSan's acquisition of Lock B (thread lock) before dl_iterate_phdr (Lock A). Historically, LSan acquired Lock B before dl_iterate_phdr, but commit f1b6bd4 moved it inside dl_iterate_phdr to handle potential malloc calls during iteration.
  • Key Question: Since ASan does not hold the allocator lock during error reporting, could reverting Lock B acquisition to pre-dl_iterate_phdr resolve the deadlock?
  • Risk: Thread creation within dl_iterate_phdr (now holding Lock B) might reintroduce deadlocks similar to those addressed in f1b6bd4.

3. Align ASan with LSan (Complete)

  • Approach: Make ASan acquire Lock A (via dl_iterate_phdr) before error reporting, mirroring LSan's lock order.
  • Challenge: Requires modifying numerous ReportXXXX functions to explicitly handle Lock A, which may introduce code bloat and maintenance overhead.
  • Tradeoff: Ensures consistent lock hierarchy (A→B) but complicates ASan's reporting logic.

4. Leverage ScopedErrorReportLock for LSan (Complete)

  • Approach: Have LSan perform dl_iterate_phdr while holding ScopedErrorReportLock, preventing concurrent ASan error reports.
  • Consideration: While LSan leak checks do not inherently require error reporting locks, this would enforce mutual exclusion between leak scans and ASan reports.
  • Open Question: Is it appropriate to tie leak-check synchronization to error reporting locks?

Key Tradeoffs

Solution Completeness Risk Code Impact
1 Partial Regression risk Low
1 (with supplement) Full Code changes in sanitizer_common Moderate
2 Partial New deadlock scenarios Moderate
3 Full High maintenance cost High
4 Full Logical coupling concerns Low

Which fix should we choose or are there any other suggestions?

@Camsyn Camsyn requested review from vitalybuka and fmayer April 2, 2025 09:04
Copy link
Contributor

@fmayer fmayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this detailed analysis.

I am fine with merging this patch which makes things strictly better and then taking it from there.

@fmayer
Copy link
Contributor

fmayer commented Apr 11, 2025

@vitalybuka does this LGTY?

@Camsyn do you want me to merge if it looks good to Vitaly?

@Camsyn
Copy link
Contributor Author

Camsyn commented Apr 12, 2025

@vitalybuka does this LGTY?

@Camsyn do you want me to merge if it looks good to Vitaly?

Sure, happy for you to merge if it looks good to both of you.

Also, I’d love to hear your thoughts on the other potential fixes I mentioned above — do you think they’re worth pursuing for a more complete resolution of the deadlock issue? If so, I’d be happy to open a separate PR for those.

@fmayer fmayer merged commit bd4d351 into llvm:main Apr 16, 2025
10 checks passed
Copy link

@Camsyn Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@citymarina
Copy link
Contributor

@Camsyn
Copy link
Contributor Author

Camsyn commented Apr 16, 2025

The newly added testcase only failed on AddressSanitizer-x86_64-darwin.

The test failed due to a timeout, and it seems that in that environment, the deadlock problem of the test case has not been resolved, and the built-in WatchDog thread did not kill the test process normally.

I'm looking for a Darwin machine to determine what is happening.

@mysterymath
Copy link
Contributor

This is causing a build failure on the Fuchsia CI bots:

https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8717476228399426513/overview

There could be something wrong with the CMake plumbing; unsure why this symbol is undefined. Do you have an intuition for what's going wrong?

[2435/2474](3) Linking CXX shared library /b/s/w/ir/x/w/llvm_build/lib/clang/21/lib/x86_64-unknown-fuchsia/libclang_rt.asan.so
FAILED: /b/s/w/ir/x/w/llvm_build/lib/clang/21/lib/x86_64-unknown-fuchsia/libclang_rt.asan.so 
: && /b/s/w/ir/x/w/llvm_build/./bin/clang++ --target=x86_64-unknown-fuchsia --sysroot=/b/s/w/ir/x/w/sdk/arch/x64/sysroot -fPIC --target=x86_64-unknown-fuchsia -I/b/s/w/ir/x/w/sdk/pkg/sync/include -I/b/s/w/ir/x/w/sdk/pkg/fdio/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -ffunction-sections -fdata-sections -ffile-prefix-map=/b/s/w/ir/x/w/llvm_build/runtimes/runtimes-x86_64-unknown-fuchsia-bins=../../../llvm-llvm-project -ffile-prefix-map=/b/s/w/ir/x/w/llvm-llvm-project/= -no-canonical-prefixes -Wall -Wno-unused-parameter -O2 -g -DNDEBUG  -L/b/s/w/ir/x/w/sdk/arch/x64/lib  -fuse-ld=lld   -nodefaultlibs -Wl,-z,text -Wl,-z,defs -nostdlib++ -Wl,--version-script,/b/s/w/ir/x/w/llvm_build/runtimes/runtimes-x86_64-unknown-fuchsia-bins/compiler-rt/lib/asan/clang_rt.asan-dynamic-x86_64.vers -shared -Wl,-soname,libclang_rt.asan.so -o /b/s/w/ir/x/w/llvm_build/lib/clang/21/lib/x86_64-unknown-fuchsia/libclang_rt.asan.so compiler-rt/lib/interception/CMakeFiles/RTInterception.x86_64.dir/interception_linux.cpp.obj compiler-rt/lib/interception/CMakeFiles/RTInterception.x86_64.dir/interception_mac.cpp.obj compiler-rt/lib/interception/CMakeFiles/RTInterception.x86_64.dir/interception_win.cpp.obj compiler-rt/lib/interception/CMakeFiles/RTInterception.x86_64.dir/interception_type_test.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_allocator.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_common.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_deadlock_detector1.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_deadlock_detector2.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_errno.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_file.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_flags.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_flag_parser.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_fuchsia.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_haiku.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_libc.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_libignore.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_linux.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_linux_s390.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_mac.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_mutex.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_netbsd.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_platform_limits_freebsd.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_platform_limits_linux.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_platform_limits_netbsd.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_platform_limits_posix.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_platform_limits_solaris.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_posix.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_printf.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_procmaps_common.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_procmaps_bsd.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_procmaps_fuchsia.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_procmaps_haiku.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_procmaps_linux.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_procmaps_mac.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_procmaps_solaris.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_range.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_solaris.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_stoptheworld_fuchsia.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_stoptheworld_mac.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_stoptheworld_win.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_suppressions.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_tls_get_addr.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_thread_arg_retval.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_thread_registry.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_type_traits.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_win.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_win_interception.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommon.x86_64.dir/sanitizer_termination.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonLibc.x86_64.dir/sanitizer_common_libcdep.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonLibc.x86_64.dir/sanitizer_allocator_checks.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonLibc.x86_64.dir/sanitizer_dl.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonLibc.x86_64.dir/sanitizer_linux_libcdep.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonLibc.x86_64.dir/sanitizer_mac_libcdep.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonLibc.x86_64.dir/sanitizer_posix_libcdep.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonLibc.x86_64.dir/sanitizer_stoptheworld_linux_libcdep.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonLibc.x86_64.dir/sanitizer_stoptheworld_netbsd_libcdep.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonCoverage.x86_64.dir/sancov_flags.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonCoverage.x86_64.dir/sanitizer_coverage_fuchsia.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonCoverage.x86_64.dir/sanitizer_coverage_libcdep_new.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonCoverage.x86_64.dir/sanitizer_coverage_win_sections.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_allocator_report.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_chained_origin_depot.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_stack_store.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_stackdepot.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_stacktrace.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_stacktrace_libcdep.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_stacktrace_printer.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_stacktrace_sparc.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_symbolizer.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_symbolizer_libbacktrace.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_symbolizer_libcdep.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_symbolizer_mac.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_symbolizer_markup.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_symbolizer_markup_fuchsia.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_symbolizer_posix_libcdep.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_symbolizer_report.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_symbolizer_report_fuchsia.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_symbolizer_win.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_thread_history.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_unwind_linux_libcdep.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_unwind_fuchsia.cpp.obj compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_unwind_win.cpp.obj compiler-rt/lib/lsan/CMakeFiles/RTLSanCommon.x86_64.dir/lsan_common.cpp.obj compiler-rt/lib/lsan/CMakeFiles/RTLSanCommon.x86_64.dir/lsan_common_fuchsia.cpp.obj compiler-rt/lib/lsan/CMakeFiles/RTLSanCommon.x86_64.dir/lsan_common_linux.cpp.obj compiler-rt/lib/lsan/CMakeFiles/RTLSanCommon.x86_64.dir/lsan_common_mac.cpp.obj compiler-rt/lib/ubsan/CMakeFiles/RTUbsan.x86_64.dir/ubsan_diag.cpp.obj compiler-rt/lib/ubsan/CMakeFiles/RTUbsan.x86_64.dir/ubsan_init.cpp.obj compiler-rt/lib/ubsan/CMakeFiles/RTUbsan.x86_64.dir/ubsan_flags.cpp.obj compiler-rt/lib/ubsan/CMakeFiles/RTUbsan.x86_64.dir/ubsan_handlers.cpp.obj compiler-rt/lib/ubsan/CMakeFiles/RTUbsan.x86_64.dir/ubsan_monitor.cpp.obj compiler-rt/lib/ubsan/CMakeFiles/RTUbsan.x86_64.dir/ubsan_value.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_allocator.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_activation.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_debugging.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_descriptions.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_errors.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_fake_stack.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_flags.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_fuchsia.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_globals.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_globals_win.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_interceptors.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_interceptors_memintrinsics.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_linux.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_mac.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_malloc_linux.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_malloc_mac.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_malloc_win.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_memory_profile.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_poisoning.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_posix.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_premap_shadow.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_report.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_rtl.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_shadow_setup.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_stack.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_stats.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_suppressions.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_thread.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_win.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_interceptors_vfork.S.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_new_delete.cpp.obj compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic_version_script_dummy.x86_64.dir/dummy.cpp.obj compiler-rt/lib/ubsan/CMakeFiles/RTUbsan_cxx.x86_64.dir/ubsan_handlers_cxx.cpp.obj compiler-rt/lib/ubsan/CMakeFiles/RTUbsan_cxx.x86_64.dir/ubsan_type_hash.cpp.obj compiler-rt/lib/ubsan/CMakeFiles/RTUbsan_cxx.x86_64.dir/ubsan_type_hash_itanium.cpp.obj compiler-rt/lib/ubsan/CMakeFiles/RTUbsan_cxx.x86_64.dir/ubsan_type_hash_win.cpp.obj -L/b/s/w/ir/x/w/llvm_build/lib/x86_64-unknown-fuchsia -lunwind  -lc  -lzircon  -ldl  -lrt  -lm  -lpthread  /b/s/w/ir/x/w/llvm_build/lib/clang/21/lib/x86_64-unknown-fuchsia/libclang_rt.builtins.a  /b/s/w/ir/x/w/llvm_build/lib/x86_64-unknown-fuchsia/libc++abi.so.1.0  -lc  /b/s/w/ir/x/w/llvm_build/lib/x86_64-unknown-fuchsia/libunwind.so.1.0 && :
ld.lld: error: undefined symbol: __sanitizer::Symbolizer::GetRefreshedListOfModules()
>>> referenced by asan_report.cpp:152 (../../../llvm-llvm-project/compiler-rt/lib/asan/asan_report.cpp:152)
>>>               compiler-rt/lib/asan/CMakeFiles/RTAsan_dynamic.x86_64.dir/asan_report.cpp.obj:(__asan::ScopedInErrorReport::ScopedInErrorReport(bool))
clang++: error: ld.lld command failed with exit code 1 (use -v to see invocation)

@vitalybuka
Copy link
Collaborator

I am going to try to fix/disable the test on those platforms

vitalybuka added a commit that referenced this pull request Apr 16, 2025
vitalybuka added a commit that referenced this pull request Apr 16, 2025
@vitalybuka
Copy link
Collaborator

Proposed Fixes for ASan-LSan Deadlock Issue

@Camsyn Will this work?

--- a/compiler-rt/lib/lsan/lsan_common_linux.cpp
+++ b/compiler-rt/lib/lsan/lsan_common_linux.cpp
@@ -117,7 +117,6 @@ void HandleLeaks() {
 
 static int LockStuffAndStopTheWorldCallback(struct dl_phdr_info *info,
                                             size_t size, void *data) {
-  ScopedStopTheWorldLock lock;
   DoStopTheWorldParam *param = reinterpret_cast<DoStopTheWorldParam *>(data);
   StopTheWorld(param->callback, param->argument);
   return 1;
@@ -133,6 +132,7 @@ static int LockStuffAndStopTheWorldCallback(struct dl_phdr_info *info,
 // callback in the parent thread.
 void LockStuffAndStopTheWorld(StopTheWorldCallback callback,
                               CheckForLeaksParam *argument) {
+  ScopedStopTheWorldLock lock;
   DoStopTheWorldParam param = {callback, argument};
   dl_iterate_phdr(LockStuffAndStopTheWorldCallback, &param);
 }

@vitalybuka
Copy link
Collaborator

Proposed Fixes for ASan-LSan Deadlock Issue

@Camsyn Will this work?

--- a/compiler-rt/lib/lsan/lsan_common_linux.cpp
+++ b/compiler-rt/lib/lsan/lsan_common_linux.cpp
@@ -117,7 +117,6 @@ void HandleLeaks() {
 
 static int LockStuffAndStopTheWorldCallback(struct dl_phdr_info *info,
                                             size_t size, void *data) {
-  ScopedStopTheWorldLock lock;
   DoStopTheWorldParam *param = reinterpret_cast<DoStopTheWorldParam *>(data);
   StopTheWorld(param->callback, param->argument);
   return 1;
@@ -133,6 +132,7 @@ static int LockStuffAndStopTheWorldCallback(struct dl_phdr_info *info,
 // callback in the parent thread.
 void LockStuffAndStopTheWorld(StopTheWorldCallback callback,
                               CheckForLeaksParam *argument) {
+  ScopedStopTheWorldLock lock;
   DoStopTheWorldParam param = {callback, argument};
   dl_iterate_phdr(LockStuffAndStopTheWorldCallback, &param);
 }

Then we will break libdl_deadlock.cpp :(
We can split Threads and Allocator lock, but the may get into the same issue later.

Lets keep as is for now.

@Camsyn
Copy link
Contributor Author

Camsyn commented Apr 17, 2025

. Revert LSan Lock Order (Incomplete)

  • Approach: Move LSan's acquisition of Lock B (thread lock) before dl_iterate_phdr (Lock A). Historically, LSan acquired Lock B before dl_iterate_phdr, but commit f1b6bd4 moved it inside dl_iterate_phdr to handle potential malloc calls during iteration.
  • Key Question: Since ASan does not hold the allocator lock during error reporting, could reverting Lock B acquisition to pre-dl_iterate_phdr resolve the deadlock?
  • Risk: Thread creation within dl_iterate_phdr (now holding Lock B) might reintroduce deadlocks similar to those addressed in f1b6bd4.

Commit f1b6bd moved the alloca lock (as well as the thread lock) into dl_iterate_phdr to avoid deadlock with other threads performing alloc operations in dl_iterate_phdr.

As for moving out the thread lock, I have analyzed this fix schema; see the second point of the above analysis for details :(. If LSan only advances the thread lock to dl_iterate_phdr, I am concerned that this will cause a new deadlock scenario (with another thread performing thread operations in the dl_iterate_phdr).

var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
…reporting (llvm#131756)

### Description  
This PR resolves a deadlock between AddressSanitizer (ASan) and
LeakSanitizer (LSan)
that occurs when both sanitizers attempt to acquire locks in conflicting
orders across
threads. The fix ensures safe lock acquisition ordering by preloading
module information
before error reporting.  

---  

### Issue Details  
**Reproducer**
```cpp  
// Thread 1: ASan error path 
int arr[1] = {0};
std::thread t([&]() {  
  arr[1] = 1; // Triggers ASan OOB error  
});  

// Thread 2: LSan check path  
__lsan_do_leak_check();   
```  

**Lock Order Conflict**:  

- Thread 1 (ASan error reporting):  
  1. Acquires ASan thread registry lock (B)  
  1. Attempts to acquire libdl lock (A) via `dl_iterate_phdr`

- Thread 2 (LSan leak check):  
  1. Acquires libdl lock (A) via `dl_iterate_phdr`
  1. Attempts to acquire ASan thread registry lock (B)  
  

This creates a circular wait condition (A -> B -> A) meeting all four
Coffman deadlock criteria.

---  

### Fix Strategy  
The root cause lies in ASan's error reporting path needing
`dl_iterate_phdr` (requiring lock A)
while already holding its thread registry lock (B). The solution:  

1. **Preload Modules Early**: Force module list initialization _before_
acquiring ASan's thread lock
2. **Avoid Nested Locking**: Ensure symbolization (via dl_iterate_phdr)
completes before error reporting locks

Key code change:  
```cpp  
// Before acquiring ASan's thread registry lock:  
Symbolizer::GetOrInit()->GetRefreshedListOfModules();  
```  

This guarantees module information is cached before lock acquisition,
eliminating
the need for `dl_iterate_phdr` calls during error reporting.  

---  

### Testing  
Added **asan_lsan_deadlock.cpp** test case:  
- Reproduces deadlock reliably without fix **under idle system
conditions**
   - Uses watchdog thread to detect hangs  
   - Verifies ASan error reports correctly without deadlock  

**Note**: Due to the inherent non-determinism of thread scheduling and
lock acquisition timing,
this test may not reliably reproduce the deadlock on busy systems (e.g.,
during parallel
   `ninja check-asan` runs).



---  

### Impact  
- Fixes rare but severe deadlocks in mixed ASan+LSan environments  
- Maintains thread safety guarantees for both sanitizers  
- No user-visible behavior changes except deadlock elimination  

---

### Relevant Buggy Code
- Code in ASan's asan_report.cpp
```cpp
  explicit ScopedInErrorReport(bool fatal = false)
      : halt_on_error_(fatal || flags()->halt_on_error) {
    // Acquire lock B
    asanThreadRegistry().Lock();
  }
  ~ScopedInErrorReport() {
    ...
    // Try to acquire lock A under holding lock B via the following path
    // #4  0x000071a353d83e93 in __GI___dl_iterate_phdr (
    //     callback=0x5d1a07a39580 <__sanitizer::dl_iterate_phdr_cb(dl_phdr_info*, unsigned long, void*)>, 
    //     data=0x6da3510fd3f0) at ./elf/dl-iteratephdr.c:39
    // #5  0x00005d1a07a39574 in __sanitizer::ListOfModules::init (this=0x71a353ebc080)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp:784
    // llvm#6  0x00005d1a07a429e3 in __sanitizer::Symbolizer::RefreshModules (this=0x71a353ebc058)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp:188
    // llvm#7  __sanitizer::Symbolizer::FindModuleForAddress (this=this@entry=0x71a353ebc058, 
    //     address=address@entry=102366378805727)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp:214
    // llvm#8  0x00005d1a07a4291b in __sanitizer::Symbolizer::SymbolizePC (this=0x71a353ebc058, addr=102366378805727)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp:88
    // llvm#9  0x00005d1a07a40df7 in __sanitizer::(anonymous namespace)::StackTraceTextPrinter::ProcessAddressFrames (
    //     this=this@entry=0x6da3510fd520, pc=102366378805727)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cpp:37
    // llvm#10 0x00005d1a07a40d27 in __sanitizer::StackTrace::PrintTo (this=this@entry=0x6da3510fd5e8, 
    //     output=output@entry=0x6da3510fd588)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cpp:110
    // llvm#11 0x00005d1a07a410a1 in __sanitizer::StackTrace::Print (this=0x6da3510fd5e8)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cpp:133
    // llvm#12 0x00005d1a0798758d in __asan::ErrorGeneric::Print (
    //     this=0x5d1a07aa4e08 <__asan::ScopedInErrorReport::current_error_+8>)
    //     at llvm-project/compiler-rt/lib/asan/asan_errors.cpp:617    
    current_error_.Print();
    ... 
  }
```

- Code in LSan's lsan_common_linux.cpp
```cpp
void LockStuffAndStopTheWorld(StopTheWorldCallback callback,
                              CheckForLeaksParam *argument) {
  // Acquire lock A
  dl_iterate_phdr(LockStuffAndStopTheWorldCallback, &param);
}

static int LockStuffAndStopTheWorldCallback(struct dl_phdr_info *info,
                                            size_t size, void *data) {
  // Try to acquire lock B under holding lock A via the following path
  // #3  0x000055555562b34a in __sanitizer::ThreadRegistry::Lock (this=<optimized out>)
  //     at llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_thread_registry.h:99
  // #4  __lsan::LockThreads () at llvm-project/compiler-rt/lib/asan/asan_thread.cpp:484
  // #5  0x0000555555652629 in __lsan::ScopedStopTheWorldLock::ScopedStopTheWorldLock (this=<optimized out>)
  //     at llvm-project/compiler-rt/lib/lsan/lsan_common.h:164
  // llvm#6  __lsan::LockStuffAndStopTheWorldCallback (info=<optimized out>, size=<optimized out>, data=0x0, 
  //     data@entry=0x7fffffffd158) at llvm-project/compiler-rt/lib/lsan/lsan_common_linux.cpp:120
  ScopedStopTheWorldLock lock;
  DoStopTheWorldParam *param = reinterpret_cast<DoStopTheWorldParam *>(data);
  StopTheWorld(param->callback, param->argument);
  return 1;
}
```
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
@mysterymath
Copy link
Contributor

Independently from Darwin, I've noticed this is a source of flakiness on our Linux x64 builders. It failed the 5 most recent runs, passed the proceeding 4, but failed the proceeding 2.

Failure message:

/b/s/w/ir/x/w/llvm-llvm-project/compiler-rt/test/asan/TestCases/asan_lsan_deadlock.cpp:61:12: error: CHECK: expected string not found in input
 // CHECK: SUMMARY: AddressSanitizer: stack-buffer-overflow
           ^
<stdin>:1:1: note: scanning from here
=================================================================
^
<stdin>:2:10: note: possible intended match here
==3675852==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7b55e78de034 at pc 0x5602bfdd6f00 bp 0x7b55e5afdcb0 sp 0x7b55e5afdca8
         ^

Input file: <stdin>
Check file: /b/s/w/ir/x/w/llvm-llvm-project/compiler-rt/test/asan/TestCases/asan_lsan_deadlock.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            1: ================================================================= 
check:61'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
            2: ==3675852==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7b55e78de034 at pc 0x5602bfdd6f00 bp 0x7b55e5afdcb0 sp 0x7b55e5afdca8 
check:61'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:61'1              ?                                                                                                                                    possible intended match
            3: WRITE of size 4 at 0x7b55e78de034 thread T2 
check:61'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            4: Timeout! Deadlock detected. 
check:61'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>

--

Example failure:
https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8717213898721101697/overview

Builder history:
https://luci-milo.appspot.com/ui/p/fuchsia/builders/luci.fuchsia.toolchain.ci/clang-linux-x64

@Camsyn
Copy link
Contributor Author

Camsyn commented Apr 19, 2025

I'm also noticing flakiness of this test on some of my x64 linux builders as well:

Deadlock Model

A->B conflicts with B->A.

Before Fix

bool module_fresh_ = false;
Mutex A, B;
void Thread1() {
  {  
     Lock L1(B);
     if (!module_fresh_) {
        Lock L2(A);
     }
  }
}
void Thread2() {
 {
    Lock L1(A);
    {
        Lock L2(B);
    }
 }
}

After Fix

bool module_fresh_ = false;
Mutex A, B;
void Thread1() {
  {
     Lock L0(A);
  }
  module_fresh_ = true;
  {
     Lock L1(B);
     if (!module_fresh_) {
        Lock L2(A);
     }
  }
}
void Thread2() {
  {
     Lock L1(A);
     {
        Lock L2(B);
     }
  }
}

I used TSan to compile the simplified model (after fix) and repeated it multiple times. TSan did not report the risk of deadlock.

In my opinion, there might be THREE reasons:

  1. The flakiness hits the flaw of the incomplete fix (dlopen/dlclose between the module preloading and the ASan error printing).
  2. There is another deadlock path.
  3. Or, although unlikely, is the current timeout of 10s too small for the LLVM buildbot? (LSan set the timeout to 40s in its deadlock-related testcase)

I managed to replicate the build locally, but unfortunately couldn't reproduce the flakiness (run make check-asan 10 times and execute xxx/asan_lsan_deadlock.cpp.tmp 100 times). In my environment, this flackiness only appears if the patch is commented out :(.

I will still actively look for the cause, and it would be better if there could be more information about the test failure.

Perhaps we need to temporarily disable the test case before fixing the flakiness? @vitalybuka

@davidmrdavid
Copy link
Contributor

Just an FYI - on msvc's ASan, which is based off the llvm codebase, we're seeing the fuzzer test stack-overflow-with-asan.test failing after this change.

That test expects ASan to report stack overflow error, but after this change, the program crashes prior to being able to report the stack overflow. This is because we're adding more work (and thus more stack frames) in the ScopedInErrorReport constructor when we're already low on stack frame space, so we blow past the limit and exit abruptly.

I haven't had the chance to test this with clang on windows, but I wanted to flag this behavior in case it's reproducible with clang. In any case, it would be great to find a fix for this deadlock that meshes will with trying to report a stack overflow.

@Camsyn
Copy link
Contributor Author

Camsyn commented Apr 23, 2025

Just an FYI - on msvc's ASan, which is based off the llvm codebase, we're seeing the fuzzer test stack-overflow-with-asan.test failing after this change.

That test expects ASan to report stack overflow error, but after this change, the program crashes prior to being able to report the stack overflow. This is because we're adding more work (and thus more stack frames) in the ScopedInErrorReport constructor when we're already low on stack frame space, so we blow past the limit and exit abruptly.

I haven't had the chance to test this with clang on windows, but I wanted to flag this behavior in case it's reproducible with clang. In any case, it would be great to find a fix for this deadlock that meshes will with trying to report a stack overflow.

In fact, this deadlock only occurs when LSan is enabled and dl_iterate_phdr is used for Stop-the-World, i.e., under the condition #if CAN_SANITIZE_LEAKS && (SANITIZER_LINUX || SANITIZER_NETBSD). I think adding this restriction to the patch could resolve the issue (i.e., disabling this patch on Windows or some other systems). I will give a new PR about this tomorrow.

Previously, I didn’t do this because I thought the patch was quite general and shouldn’t cause any significant negative side effects. And indeed, I haven’t encountered any issues with it in my Linux environment :(.

Camsyn added a commit to Camsyn/llvm-project that referenced this pull request Apr 24, 2025
PR llvm#131756 introduced a patch to fix a deadlock between LSan and ASan.

The relevant deadlock only occurs when LSan is enabled and
`dl_iterate_phdr` is used for Stop-the-World, i.e., under the condition
`CAN_SANITIZE_LEAKS && (SANITIZER_LINUX || SANITIZER_NETBSD)`.

Therefore, this commit also sets the effective condition of this patch
to the above conditions, avoiding unnecessary problems in other
environments.
vitalybuka pushed a commit that referenced this pull request Apr 24, 2025
PR #131756 introduced a patch to fix a deadlock between LSan and ASan.

The relevant deadlock only occurs when LSan is enabled and
`dl_iterate_phdr` is used for Stop-the-World, i.e., under the condition
`CAN_SANITIZE_LEAKS && (SANITIZER_LINUX || SANITIZER_NETBSD)`.

Therefore, this commit also sets the effective condition of this patch
to the above condition, avoiding unnecessary problems in other
environments, e.g., stack overflow on MSVC/Windows.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants