Skip to content

Implement DWARF parser for better backtraces #58

Closed
@mathetake

Description

@mathetake

Background

LLVM-based compilers for Wasm, for examples C/C++, Rust, Zig, TinyGo (virtually 100% of viable languages),
emit DWARF information into .debug_* custom sections. The following is the sections contained in a TinyGo binary:

$ wasm-objdump main.wasm -h

main.go.wasm:	file format wasm 0x1

Sections:

     Type start=0x0000000b end=0x00000158 (size=0x0000014d) count: 42
   Import start=0x0000015b end=0x000003df (size=0x00000284) count: 18
 Function start=0x000003e2 end=0x000004e8 (size=0x00000106) count: 260
    Table start=0x000004ea end=0x000004ef (size=0x00000005) count: 1
   Memory start=0x000004f1 end=0x000004f4 (size=0x00000003) count: 1
   Global start=0x000004f6 end=0x000004fe (size=0x00000008) count: 1
   Export start=0x00000501 end=0x000007ac (size=0x000002ab) count: 31
     Code start=0x000007b0 end=0x0001b258 (size=0x0001aaa8) count: 260
     Data start=0x0001b25c end=0x00020862 (size=0x00005606) count: 2
   Custom start=0x00020866 end=0x00034e9a (size=0x00014634) ".debug_info"
   Custom start=0x00034e9d end=0x00035f66 (size=0x000010c9) ".debug_pubtypes"
   Custom start=0x00035f6a end=0x000431a7 (size=0x0000d23d) ".debug_loc"
   Custom start=0x000431aa end=0x00044f60 (size=0x00001db6) ".debug_ranges"
   Custom start=0x00044f62 end=0x00044fa1 (size=0x0000003f) ".debug_aranges"
   Custom start=0x00044fa4 end=0x00046ef6 (size=0x00001f52) ".debug_abbrev"
   Custom start=0x00046efa end=0x00059503 (size=0x00012609) ".debug_line"
   Custom start=0x00059507 end=0x0006510b (size=0x0000bc04) ".debug_str"
   Custom start=0x0006510f end=0x0006bf6b (size=0x00006e5c) ".debug_pubnames"
   Custom start=0x0006bf6e end=0x0006e6e8 (size=0x0000277a) "name"
   Custom start=0x0006e6eb end=0x0006e778 (size=0x0000008d) "producers"

By reading debug sections, we can associate "each wasm instruction" in functions to a specific line of a source code which the binary is compiled from.

Why?

Some of the de-facto Wasm tools have already supported the DWARF format. For example Google Chrome[3] has allowed users to debug Wasm programs on the browser. Another example is Wasmtime -- when you run the panic example in this repo with WASMTIME_BACKTRACE_DETAILS=1, you can see the backtrace with source code info mation:

$ WASMTIME_BACKTRACE_DETAILS=1 wasmtime run examples/wasm/trap.wasm --invoke cause_panic
panic: causing panic!!!!!!!!!!
Error: failed to run main module `examples/wasm/trap.wasm`

Caused by:
    0: failed to invoke `cause_panic`
    1: wasm trap: unreachable
       wasm backtrace:
           0:  0x92a - runtime.abort
                           at /usr/local/lib/tinygo/src/runtime/runtime_tinygowasm.go:63:6
                     - runtime._panic
                           at /usr/local/lib/tinygo/src/runtime/panic.go:13:7
           1:  0x9ba - main.three
                           at /home/mathetake/gasm/examples/wasm/trap.go:19:7
           2:  0x9b0 - main.two
                           at /home/mathetake/gasm/examples/wasm/trap.go:15:7
           3:  0x9a6 - main.one
                           at /home/mathetake/gasm/examples/wasm/trap.go:11:5
           4:  0x99c - cause_panic
                           at /home/mathetake/gasm/examples/wasm/trap.go:7:5

On the other hand, at the moment of this writing, our backtrace is not using DWARF, but just parsing "name" custom sections and attach each function name:

panic: causing panic!!!!!!!!!!
wasm runtime error: unreachable
wasm backtrace:
	0: runtime._panic
	1: main.three
	2: main.two
	3: main.one
	4: cause_panic

This will be much more useful when users run non-TinyGo Wasms -- usually the function names are mangled by compilers (luckily TinyGo does not!) so they are basically not human-readable. For example, Rust binary's backtrace with custom sections would look like this:

  0:  0x42deb - __rust_start_panic
  1:  0x42c0c - rust_panic
  2:  0x42882 - _ZN3std9panicking20rust_panic_with_hook17h072472ae3822b936E
  3:  0x32914 - _ZN3std9panicking11begin_panic28_$u7b$$u7b$closure$u7d$$u7d$17hed88036b12f483dfE
  4:  0x34891 - _ZN3std10sys_common9backtrace26__rust_end_short_backtrace17h9133fcc3e85035deE
  5:  0x32810 - _ZN3std9panicking11begin_panic17he6f6e918174263cfE
  6:  0x39eb - _ZN77_$LT$http_headers..HttpHeaders$u20$as$u20$proxy_wasm..traits..HttpContext$GT$6on_log17hde90e85ea16e616eE
  7:  0x2ae53 - _ZN10proxy_wasm10dispatcher10Dispatcher6on_log17hc6cd4fb35c538b86E
  8:  0x2d3dd - _ZN10proxy_wasm10dispatcher12proxy_on_log28_$u7b$$u7b$closure$u7d$$u7d$17h3f864ec735f41e70E
  9:  0x311bd - _ZN3std6thread5local17LocalKey$LT$T$GT$8try_with17hc87d8e9cf2d2494cE

With DWARF information, we don't need to parse "name" custom section therefore we won't suffer this mangled dirty symbols and instead we can emit each trace with human-readable function names plus source code info.

How?

Wasm DWARF format[1] is almost same as the standard DWARF specification version 5?[2] with the difference where the address should be interpreted as an offset from the beginning of "the code section" vs the beginning of "the binary" in non-Wasm format.

So it should be simple to write parser by getting insights from other DWARF implementations.

Links

[1] https://yurydelendik.github.io/webassembly-dwarf/
[2] https://dwarfstd.org/doc/DWARF5.pdf pdf!
[3] https://twitter.com/ChromeDevTools/status/1192803818024710145

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions