Skip to content

Fix stacktrace issues on Windows #122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 6, 2024
Merged

Conversation

Timbals
Copy link
Contributor

@Timbals Timbals commented Sep 24, 2024

There are 2 issues with capturing stack traces on Windows.
Specifically with the resolving of the symbols of the stack traces.
Note that this affects both the stack traces captured by Rust and Tracy.

Both Rust and Tracy use the dbghelp.dll symbol helper to resolve symbols.
The first issue is that dbghelp.dll is single threaded and all its functions need to be externally synchronized.
Rust uses a named Windows mutex to synchronize access to dbghelp.dll between the standard library and the backtrace-rs crate when both are used.
The first commit in this PR makes Tracy use the same named Mutex.
Not properly synchronizing can lead to partially corrupted stacktraces.

Example corrupted stacktrace

Note the <unknown> at the top probably caused by some race condition

stack backtrace:
   0:     0x7ff7a900767d - <unknown>
   1:     0x7ff7a9014bc9 - <unknown>
   2:     0x7ff7a9006011 - <unknown>
   3:     0x7ff7a90093c7 - <unknown>
   4:     0x7ff7a9008fb9 - <unknown>
   5:     0x7ff7a9009a72 - <unknown>
   6:     0x7ff7a900990f - <unknown>
   7:     0x7ff7a9007d6f - <unknown>
   8:     0x7ff7a9009526 - <unknown>
   9:     0x7ff7a903dbc4 - <unknown>
  10:     0x7ff7a90012ee - tracy_client_stacktrace::main
                               at C:\Dev\tracy-client-stacktrace-issue\src\main.rs:5
  11:     0x7ff7a900138b - core::ops::function::FnOnce::call_once<void (*)(),tuple$<> >
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c\library\core\src\ops\function.rs:250
  12:     0x7ff7a900142e - core::hint::black_box
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c\library\core\src\hint.rs:389
  13:     0x7ff7a900142e - std::sys::backtrace::__rust_begin_short_backtrace<void (*)(),tuple$<> >
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c\library\std\src\sys\backtrace.rs:152
  14:     0x7ff7a9001401 - std::rt::lang_start::closure$0<tuple$<> >
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c\library\std\src\rt.rs:162
  15:     0x7ff7a9004279 - std::rt::lang_start_internal::closure$2
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library\std\src\rt.rs:141
  16:     0x7ff7a9004279 - std::panicking::try::do_call
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library\std\src\panicking.rs:557
  17:     0x7ff7a9004279 - std::panicking::try
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library\std\src\panicking.rs:521
  18:     0x7ff7a9004279 - std::panic::catch_unwind
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library\std\src\panic.rs:350
  19:     0x7ff7a9004279 - std::rt::lang_start_internal
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library\std\src\rt.rs:141
  20:     0x7ff7a90013da - std::rt::lang_start<tuple$<> >
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c\library\std\src\rt.rs:161
  21:     0x7ff7a9001309 - main
  22:     0x7ff7a903b918 - invoke_main
                               at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78
  23:     0x7ff7a903b918 - __scrt_common_main_seh
                               at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
  24:     0x7ffaad70257d - BaseThreadInitThunk
  25:     0x7ffaae1eaf28 - RtlUserThreadStart

The second issue is that Rust and Tracy initialize the symbol helper in different ways.
Since Rust 1.78.0 the compiler no longer includes absolute paths to .pdb files in the binary (rust-lang/rust#121297). That can make the symbol resolution fail because by default only the current working directory is searched for .pdb files.
There was also a corresponding PR to backtrace-rs to include the executable location in the search path (rust-lang/backtrace-rs#584), but when Tracy initializes dbghelp.dll first, the modules get loaded before the search path gets modified.
The second commit in this PR fixes that by capturing and resolving a backtrace using the standard library before Tracy does the initialization.
This fixes #101.

Copy link
Owner

@nagisa nagisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking into this and getting to the bottom of the problem!

Perhaps the one thing I'm somewhat iffy about is the dependency on windows here. I think this and windows-sys are outsized dependencies for the tracy-client-sys crate. Given the limited use of the APIs, could this perhaps be changed to use windows-targets instead? Some of my thoughts on the windows{,-sys} dependency are here. You can also find some inspiration for the changes here at that same link.

It is fine if you don't have the time or capacity for such a change -- I'll just get around to it myself at some point in the future before I cut the next release.

// Initialize the `dbghelp.dll` symbol helper by capturing and resolving a backtrace using the standard library.
// Since symbol resolution is lazy, the backtrace is written to `sink`, which forces symbol resolution.
// Refer to the module documentation on why the standard library should do the initialization instead of Tracy.
write!(sink(), "{:?}", std::backtrace::Backtrace::force_capture()).unwrap();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we avoid the unwrap here? Although it is very likely that writing a backtrace to a sink() cannot fail at all, ever, just in case there is some perverse corner case, I would rather users deal with broken stack traces than the program failing completely in that scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result is ignored now. This is only called to trigger a symbol resolve in the standard library, so the result doesn't matter.

write!(sink(), "{:?}", std::backtrace::Backtrace::force_capture()).unwrap();

// The name is the same one that the standard library and `backtrace-rs` use
let mut name = [0; 33];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut name = [0; 33];
let mut name: [u8; 33] = *b"Local\\RustBacktraceMutexFFFFFFFF\0";

makes it a little more obvious that 33 is indeed sufficient (and exact) size required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses format! directly now. I initially adapted this from the implementation in backtrace-rs which seems to avoid allocations, but that is not necessary here.

// Initialization of the `dbghelp.dll` symbol helper should have already happened
// through the standard library backtrace above.
// Therefore, the shared named mutex should already have existed.
assert_eq!(GetLastError(), ERROR_ALREADY_EXISTS);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd debug_assert_eq this for the similar reasons as above (in case libstd stops creating this Mutex, if it just so happened that the dbghelp no longer needed any locks to work.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the error handling, and it will no longer panic should the standard library change its symbol resolution logic.

Only a small subset of the windows APIs is necessary. So we define that subset ourselves and avoid a heavy dependency.
@nagisa nagisa merged commit 9f4c7d7 into nagisa:main Nov 6, 2024
@nagisa
Copy link
Owner

nagisa commented Nov 6, 2024

Sorry for the delay! Will release 0.24.2 with this shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rust-Based Stack Traces Corrupt When Tracy Enabled
2 participants