Skip to content

Binaryen as a qemu backend #1494

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tbodt opened this issue Apr 4, 2018 · 41 comments
Closed

Binaryen as a qemu backend #1494

tbodt opened this issue Apr 4, 2018 · 41 comments

Comments

@tbodt
Copy link

tbodt commented Apr 4, 2018

(Continuing conversation from WebAssembly/design#796)

I compiled qemu with --enable-profiler, added a few missing profiler logs, set an uninitialized variable to zero, and tested booting a FreeDOS image with 1 cpu (so no multithreading.) Booting to the C:> prompt took 10 seconds. info jit at the monitor prompt gives this:

Translation buffer state:
gen code size       31204435/33549267
TB count            36102
TB avg target size  17 max=152 bytes
TB avg host size    668 bytes (expansion ratio: 37.7)
cross page TB count 35 (0%)
direct jump count   14863 (41%) (2 jumps=10899 30%)
TB hash buckets     4146/8192 (50.61% head buckets used)
TB hash occupancy   17.63% avg chain occ. Histogram: [0,10)%|█▁▅▁ ▂▁▁ ▁|[90,100]%
TB hash avg chain   1.008 buckets. Histogram: 1|█▁|2

Statistics:
TB flush count      13
TB invalidate count 324441
TLB flush count     484
JIT cycles          4737605000 (1.974 s at 2.4 GHz)
translated TBs      337178 (aborted=0 0.0%)
avg ops/TB          93.9 max=362
deleted ops/TB      2.09
avg temps/TB        41.03 max=46
avg host code/TB    1158.3
avg search data/TB  43.2
ops                 31648062
in bytes            10142744
out bytes           390555017
search bytes        14581774
cycles/op           149.7
cycles/in byte      467.1
cycles/out byte     12.1
cycles/search byte     324.9
  gen_interm time   20.5%
  gen_code time     79.5%
optim./code time    23.0%
liveness/code time  9.1%
cpu_restore count   106
  avg cycles        160.4

What's particularly interesting is the cycles/op number: 149.7. That's the total cycles spent doing compiling (measured by rdtsc) divided by the number of qemu IR operations generated. I doubt binaryen would be that fast. 20% of the time here is spent doing compiling, so a slowdown in that area would be noticeable.

@kripken

@kripken
Copy link
Member

kripken commented Apr 5, 2018

Binaryen would add to that, yeah. But, being realistic, Binaryen is going to be faster than the WebAssembly VM compiling it to machine code afterwards anyhow. WebAssembly is an intermediate format, so in an environment like qemu slower compilation is unavoidable.

In a workload dominated by compilation WebAssembly just might not be good enough, period. On the other hand, for code that is compiled once and run many times, increasing the compile time even significantly shouldn't be too bad - that's why I'm very interested in this. But, what's your use case?

@tbodt
Copy link
Author

tbodt commented Apr 5, 2018

My usecase is having a nice development environment running on an iOS device. Qemu + WebAssembly seemed like the most promising way to make this fast.

@kripken
Copy link
Member

kripken commented Apr 5, 2018

I see. My guess is that could be fast enough (but we'd need to measure to be sure, of course).

Happy to help try this out - I can help writing out the Binaryen integration bits, if you set up a repo. I just don't know where to get started on the qemu side.

Btw, another option here is to interpret the wasm (using the Binaryen interpreter, or another), to avoid compilation entirely. Then compilation can be done after the function is used enough times, as a JIT optimization, etc. I'm optimistic we can find ways to make this work efficiently.

@davidar
Copy link

davidar commented Aug 17, 2018

Did this go anywhere? I'd also be interested in seeing this happen

@tbodt
Copy link
Author

tbodt commented Aug 17, 2018

@davidar No, I never found the time to work on it. It's still on my list of unfinished projects that I might work on later though.

@davidar
Copy link

davidar commented Aug 18, 2018

@atrosinenko You seem to have quite some experience building QEMU with Emscripten, I'm curious if you have any thoughts on this issue?

@atrosinenko
Copy link

@davidar Now I'm planning to rewrite my port of QEMU on top of v3.0 implementing WASM JIT instead of Asm.js one that was implemented more as proof-of-concept (and fall back to compiling plain TCI in Asm.js) -- see qemujs-v2 branch. But I fear Binaryen's Apache 2.0 license is not compatible with QEMU that is AFAIK GPLv2 as a whole (different parts have different license options sometimes).

For Asm.js I was compiling basic block after N executions (hooked qemu_tb_exec -- you can see diff of v2.4.1...emscripten branches) and hope it will be much faster when eliminating machine code -> TCI bytecode -> asm.js -> parse JS -> ... -> machine code flow.

@kripken
Copy link
Member

kripken commented Aug 18, 2018

Regarding the license, that is something we should fix if, if it's a problem.

@jfbastien, I know the WebAssembly/ projects, like LLVM, were thinking about doing Apache 2 plus exceptions. Looks like those exceptions might help here (QEMU is GPL2). Do you know what the status is there?

@atrosinenko
Copy link

Regarding the license, that is something we should fix if, if it's a problem.

But isn't changing a license a big problem when there are already 64 contributors (or do you have copyright transfer)? (I'm not a lawyer, maybe Apache 2.0 is not a problem...)

Do you mean upstreaming Binaryen support for QEMU? If so, it may be worth reading QEMU contributor guidelines beforehand: Contribute/SubmitAPatch, to not fix formal requirements afterwards. But not sure whether they would accept such a port. For example, AFAIK now multithreading in browsers is disabled due to security reasons, so I have some kludges to make QEMU single-threaded at the cost of stability. On the other hand, if the original issue was about native QEMU linked to system JS/WASM-engine, then only CPU thread would be "single-threaded", so why not... Anyways, it would be great to upstream as much as possible.

Meanwhile, is there some possibility in WASM to compile one basic block at a time to WASM modules and then, when QEMU decides to directly link them, somehow link these modules, so they can pass execution from one piece of AOT compiled code to another one efficiently?

@tbodt
Copy link
Author

tbodt commented Aug 20, 2018

@atrosinenko Multithreading was recently re-enabled (in Chrome at least) because other spectre mitigations were put in place.

@tbodt
Copy link
Author

tbodt commented Aug 20, 2018

As for compiling one basic block at a time, you can either:

  • Put all the basic blocks in a function, run binaryen's relooper on the function, and reload the module
  • Wait for tailcall support, make each basic block its own function that ends with a tailcall, and reload the module

Either way, you have to reload the module with all your code in it, which requires the browser to throw out all of its existing compiled machine code and start over. It seems like WASM wasn't designed with this use case in mind.

@jfbastien
Copy link
Member

@binji
Copy link
Member

binji commented Aug 20, 2018

@tbodt you can also compile the function in its own module, export it, and put it in a table. Then the original module can use call_indirect to call this function.

@kripken
Copy link
Member

kripken commented Aug 20, 2018

@jfbastien Thanks! And to be sure, the wasm projects are going to be licensed in the exact same way as LLVM?

@atrosinenko

But isn't changing a license a big problem when there are already 64 contributors?

Well, in practice probably 99% of the code in Binaryen was written by less than 10 people. And I'm pretty sure all of them are very excited about enabling use cases like Qemu (I definitely am), so I think we could dual-license it (say, to add MIT) or some other solution. But it sounds like the LLVM license change will apply here too, in a matter of months, so we may not even need to do that.

Overall, I encourage you to experiment on this stuff :) We'll make the licensing stuff work out.

@jfbastien
Copy link
Member

@jfbastien Thanks! And to be sure, the wasm projects are going to be licensed in the exact same way as LLVM?

That was discussed a long time ago. Probably worth bringing up again at the CG meeting. @binji ?

@binji
Copy link
Member

binji commented Aug 20, 2018

@jfbastien I'm definitely not up-to-speed on WebAssembly licensing stuff -- but happy to add it to the agenda if someone can drive the conversation. :-)

@kripken
Copy link
Member

kripken commented Aug 20, 2018

I'm happy to drive the conversation. I thought I remembered we had a plan here and/or someone that was focused on this? But I guess we can discuss that in the CG meeting :)

@binji
Copy link
Member

binji commented Aug 20, 2018

@kripken Thanks, I've added an agenda item here: WebAssembly/meetings#296. We have a pretty full meeting, so it may end up pushed to the next one.

@jfbastien
Copy link
Member

@kripken can you dig up the prior discussions for this?

@kripken
Copy link
Member

kripken commented Aug 20, 2018

Sure. So for Binaryen I see this issue came up in 2015, when we added the initial license, #5 - already there we mentioned needing an exception. I don't see any later discussion. So maybe that is now :)

We've also discussed some W3C specific issues in #1358 (joining the community group as a barrier to contributors) but that is orthogonal.

Outside of Binaryen, the issue came up in the design repo in WebAssembly/design#668 - there it is suggested that wasm projects follow the LLVM license change, and it mentions updates in the future (as at the time the lawyers were still figuring things out, I guess?), but I don't see any later discussion.

@dschuff
Copy link
Member

dschuff commented Aug 22, 2018

From http://llvm.org/foundation/relicensing/ it looks like the license text is finalized although they seem to be be behind on publicizing the license agreement (although it ought to be soon). I think it makes sense to consider giving Binaryen and WABT the same license; although neither Binaryen nor WABT have runtime libraries in the same sense that LLVM does so it's not clear to me whether the first exception would apply (or whether that matters). I should check with the lawyers on our end. And then of course we'd need a plan to actually execute the relicensing. But assuming we can work it all out with the Binaryen contributors, I'm also interested it knowing from the OP or other interested parties whether this would actually solve their problem.

@jfbastien
Copy link
Member

Latest update on LLVM relicensing: http://lists.llvm.org/pipermail/llvm-foundation/2018-July/000162.html
@dschuff I'd suggest you reach out to Danny (and say hi for me!). I had talked to him about it maybe ~3 years ago?

@atrosinenko
Copy link

@dschuff

But assuming we can work it all out with the Binaryen contributors, I'm also interested it knowing from the OP or other interested parties whether this would actually solve their problem.

As an interested party, I don't know right now whether Binaryen can help me :) , I was just asked what I think on using Binaryen as a QEMU backend and I answered on what I consider a blocker (maybe it isn't really). Now you say it can be fixed, then I need to re-evaluate.

Do I get it right, Binaryen can be used as a library constructing WASM binaries at run-time and even somehow optimize them? Meanwhile, is there some "official" WASM interpreter in C or C++? Another possibility may be to implement emitting WASM on my own, it would be some difficulty to master its binary format, but not sure whether I need much more than that from Binaryen.

@kripken
Copy link
Member

kripken commented Aug 22, 2018

Do I get it right, Binaryen can be used as a library constructing WASM binaries at run-time and even somehow optimize them?

Yes, and that's one of the goals of Binaryen - to make it easy to write not just static compilers, but also JITs. I hope to see it used in places like QEMU and Mono, for example.

Specifically, Binaryen has a very simple C API for generating its IR, and then you can tell it to emit wasm from that. Optionally you can also run Binaryen's optimization passes first, which are designed to be fast enough to run in a JIT.

@atrosinenko
Copy link

Sounds very promising!

@binji
Copy link
Member

binji commented Aug 22, 2018

Meanwhile, is there some "official" WASM interpreter in C or C++?

The WebAssembly reference interpreter is written on OCaml: https://github.com/WebAssembly/spec/tree/master/interpreter

Binaryen and wabt also have interpreters.

Another possibility may be to implement emitting WASM on my own,

This is not as difficult as you might expect; WebAssembly is a relatively simple format. See, for example, waforth which JITs Forth code.

@caffeinum
Copy link

I dont know if this is very connected to this issue.

I am trying to build c-lightning to WASM, and the section on cross-building asks to use qemu-user to build project:

Two makefile targets should not be cross-compiled so we specify a native CC:

make CC=clang clean ccan/tools/configurator/configurator
make clean -C ccan/ccan/cdump/tools \
  && make CC=clang -C ccan/ccan/cdump/tools

Install the qemu-user package. This will allow you to properly configure the build for the target device environment. Build with:

BUILD=x86_64 MAKE_HOST=arm-linux-androideabi \
  make PIE=1 DEVELOPER=0 \
  CONFIGURATOR_CC="arm-linux-androideabi-clang -static"

Here is the link. https://github.com/ElementsProject/lightning/blob/master/doc/INSTALL.md#to-cross-compile-for-android

I am not sure how to use qemu for WASM? Is it possible yet?

ADDITIONAL NOTE:
If I try to build using emscripten directly using:

emconfigure ./configure

Then the error pops up like that:

Alekseys-MacBook-Pro:lightning caffeinum$ emconfigure ./configure
-n Compiling ccan/tools/configurator/configurator...
error: unresolved symbol: popen
Aborting compilation due to previous errors | undefined
Traceback (most recent call last):
...
Exception: Expected the command ['/Users/caffeinum/emsdk/node/8.9.1_64bit/bin/node', '/Users/caffeinum/emsdk/emscripten/1.38.12/src/compiler.js', '/tmp/tmp7dqefN.txt', '/Users/caffeinum/emsdk/emscripten/1.38.12/src/library_pthread_stub.js'] to finish with return code 0, but it returned with code 1 instead! Output: // The Module object: Our interface to the outside world. We import
// and export values on it. There are various ways Module can be used:
// 1. Not defined. We create it here
// 2. A function parameter, function(Module) { ..generated code.. }
// 3. pre-run appended it, var Module = {}; ..generated 
ERROR:root:Configure step failed with non-zero return code: 1.  Command line: ./configure at /Users/caffeinum/lightning

I set up an issue: ElementsProject/lightning#1370

Also, using an approach for cross-compiling in the instruction, https://github.com/ElementsProject/lightning/blob/master/doc/INSTALL.md#to-cross-compile-for-android, how do I set target_host value?

Sorry if that's the wrong place!

@atrosinenko
Copy link

Here is a WIP example. Right now, the block layer does not work, since I had troubles with compiling with Emterpretify (hard to specify what to Emterpret and what not).

@kripken
Copy link
Member

kripken commented Feb 21, 2019

I get an out of memory crash in chrome, and on firefox I see an exception thrown, exception thrown: CompileError: at offset 962: unrecognized opcode: c0 0 - is that expected?

@atrosinenko
Copy link

On Chrome it is expected and I don't know how to work around it generally, except for fixing Chrome. :)

In Firefox it should run Memtest (and then wait forever), OpenWRT expected to run for a while, then crash (everything tested without Network support).

@kripken
Copy link
Member

kripken commented Feb 21, 2019

I see, thanks. Ok, let's file a bug on chrome for that, yeah. First thing, though, is that an optimized build? I see the binaryen optimizer can shrink that file by 13%, which suggests maybe it's -O0?

@atrosinenko
Copy link

I fear, it is not because of the main WASM module, but because of ~1000th one -- when I debugged it the last time it successfully compiled 1000+ small WASM modules, then crashed, the same with Firefox, but with a much large module number. It would probably be quite easy to compile not everything if I manage to set up interpretation in Bynarien.

Compiling selected TBs only after some execution count would probably be much faster, as well.

@kripken
Copy link
Member

kripken commented Feb 22, 2019

I see, thanks. Ok, I can open a bug with that - is it a stable URL?

@atrosinenko
Copy link

@kripken I copied it to separate directory: https://atrosinenko.github.io/qemujs-demo/chrome-bug/shell.html -- it will probably be deleted sometime, but can be considered as a stable URL for the time of fixing the bug.

@kripken
Copy link
Member

kripken commented Feb 25, 2019

@atrosinenko
Copy link

atrosinenko commented Feb 25, 2019

Thanks! Now I work on using Binaryen interpreter for first N executions, it even run Memtest in Chrome, but now it leaks (in a C meaning of leak) even in Firefox, so I try to run in natively but with Binaryen instead of TCI / native TCG.

@tbodt Maybe it can somehow be used on iOS semi-natively this way (with JITting through the standard system JS engine inside a native app), but I know almost nothing about iOS development.

@atrosinenko
Copy link

The demo was updated: now it does not translate everything, just TBs that are frequently executed. Meanwhile: how to handle Apache 2.0 vs GPLv2?

@kripken
Copy link
Member

kripken commented May 13, 2019

If the only problem with Apache is from Binaryen itself, then I'm pretty sure we can relicense it if we need to - not that old a project, not too many developers, and I doubt anyone would object. It would take some time and effort though.

However, the key issue might be whether the combined program makes binaryen a derivative work of qemu. It might be worth asking a lawyer here, since it's not clear if you are really mixing Apache and GPL2 code - any new code in qemu to use Binaryen would be GPL2 I assume, and otherwise you are using Binaryen in an unmodified way through a pre-existing API. Anyhow, if it's hard to get a clear legal answer, or if the answer is negative, we can look into relicensing as mentioned above.

@vshymanskyy
Copy link

vshymanskyy commented Feb 21, 2020

@atrosinenko You may want to look at Wasm3, which is a fast Wasm interpreter

@tlively
Copy link
Member

tlively commented Mar 14, 2025

Closing this because there doesn't seem to be anything actionable for Binaryen to do here. If the license issues are still relevant, we should discuss them in a new issue.

@tlively tlively closed this as completed Mar 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants