Description
Background
LLVM-based compilers for Wasm, for examples C/C++, Rust, Zig, TinyGo (virtually 100% of viable languages),
emit DWARF information into .debug_*
custom sections. The following is the sections contained in a TinyGo binary:
$ wasm-objdump main.wasm -h
main.go.wasm: file format wasm 0x1
Sections:
Type start=0x0000000b end=0x00000158 (size=0x0000014d) count: 42
Import start=0x0000015b end=0x000003df (size=0x00000284) count: 18
Function start=0x000003e2 end=0x000004e8 (size=0x00000106) count: 260
Table start=0x000004ea end=0x000004ef (size=0x00000005) count: 1
Memory start=0x000004f1 end=0x000004f4 (size=0x00000003) count: 1
Global start=0x000004f6 end=0x000004fe (size=0x00000008) count: 1
Export start=0x00000501 end=0x000007ac (size=0x000002ab) count: 31
Code start=0x000007b0 end=0x0001b258 (size=0x0001aaa8) count: 260
Data start=0x0001b25c end=0x00020862 (size=0x00005606) count: 2
Custom start=0x00020866 end=0x00034e9a (size=0x00014634) ".debug_info"
Custom start=0x00034e9d end=0x00035f66 (size=0x000010c9) ".debug_pubtypes"
Custom start=0x00035f6a end=0x000431a7 (size=0x0000d23d) ".debug_loc"
Custom start=0x000431aa end=0x00044f60 (size=0x00001db6) ".debug_ranges"
Custom start=0x00044f62 end=0x00044fa1 (size=0x0000003f) ".debug_aranges"
Custom start=0x00044fa4 end=0x00046ef6 (size=0x00001f52) ".debug_abbrev"
Custom start=0x00046efa end=0x00059503 (size=0x00012609) ".debug_line"
Custom start=0x00059507 end=0x0006510b (size=0x0000bc04) ".debug_str"
Custom start=0x0006510f end=0x0006bf6b (size=0x00006e5c) ".debug_pubnames"
Custom start=0x0006bf6e end=0x0006e6e8 (size=0x0000277a) "name"
Custom start=0x0006e6eb end=0x0006e778 (size=0x0000008d) "producers"
By reading debug sections, we can associate "each wasm instruction" in functions to a specific line of a source code which the binary is compiled from.
Why?
Some of the de-facto Wasm tools have already supported the DWARF format. For example Google Chrome[3] has allowed users to debug Wasm programs on the browser. Another example is Wasmtime -- when you run the panic example in this repo with WASMTIME_BACKTRACE_DETAILS=1
, you can see the backtrace with source code info mation:
$ WASMTIME_BACKTRACE_DETAILS=1 wasmtime run examples/wasm/trap.wasm --invoke cause_panic
panic: causing panic!!!!!!!!!!
Error: failed to run main module `examples/wasm/trap.wasm`
Caused by:
0: failed to invoke `cause_panic`
1: wasm trap: unreachable
wasm backtrace:
0: 0x92a - runtime.abort
at /usr/local/lib/tinygo/src/runtime/runtime_tinygowasm.go:63:6
- runtime._panic
at /usr/local/lib/tinygo/src/runtime/panic.go:13:7
1: 0x9ba - main.three
at /home/mathetake/gasm/examples/wasm/trap.go:19:7
2: 0x9b0 - main.two
at /home/mathetake/gasm/examples/wasm/trap.go:15:7
3: 0x9a6 - main.one
at /home/mathetake/gasm/examples/wasm/trap.go:11:5
4: 0x99c - cause_panic
at /home/mathetake/gasm/examples/wasm/trap.go:7:5
On the other hand, at the moment of this writing, our backtrace is not using DWARF, but just parsing "name" custom sections and attach each function name:
panic: causing panic!!!!!!!!!!
wasm runtime error: unreachable
wasm backtrace:
0: runtime._panic
1: main.three
2: main.two
3: main.one
4: cause_panic
This will be much more useful when users run non-TinyGo Wasms -- usually the function names are mangled by compilers (luckily TinyGo does not!) so they are basically not human-readable. For example, Rust binary's backtrace with custom sections would look like this:
0: 0x42deb - __rust_start_panic
1: 0x42c0c - rust_panic
2: 0x42882 - _ZN3std9panicking20rust_panic_with_hook17h072472ae3822b936E
3: 0x32914 - _ZN3std9panicking11begin_panic28_$u7b$$u7b$closure$u7d$$u7d$17hed88036b12f483dfE
4: 0x34891 - _ZN3std10sys_common9backtrace26__rust_end_short_backtrace17h9133fcc3e85035deE
5: 0x32810 - _ZN3std9panicking11begin_panic17he6f6e918174263cfE
6: 0x39eb - _ZN77_$LT$http_headers..HttpHeaders$u20$as$u20$proxy_wasm..traits..HttpContext$GT$6on_log17hde90e85ea16e616eE
7: 0x2ae53 - _ZN10proxy_wasm10dispatcher10Dispatcher6on_log17hc6cd4fb35c538b86E
8: 0x2d3dd - _ZN10proxy_wasm10dispatcher12proxy_on_log28_$u7b$$u7b$closure$u7d$$u7d$17h3f864ec735f41e70E
9: 0x311bd - _ZN3std6thread5local17LocalKey$LT$T$GT$8try_with17hc87d8e9cf2d2494cE
With DWARF information, we don't need to parse "name" custom section therefore we won't suffer this mangled dirty symbols and instead we can emit each trace with human-readable function names plus source code info.
How?
Wasm DWARF format[1] is almost same as the standard DWARF specification version 5?[2] with the difference where the address should be interpreted as an offset from the beginning of "the code section" vs the beginning of "the binary" in non-Wasm format.
So it should be simple to write parser by getting insights from other DWARF implementations.
Links
[1] https://yurydelendik.github.io/webassembly-dwarf/
[2] https://dwarfstd.org/doc/DWARF5.pdf pdf!
[3] https://twitter.com/ChromeDevTools/status/1192803818024710145