-
Notifications
You must be signed in to change notification settings - Fork 951
Asyncify for goroutines in WebAssembly? #1101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is unfortunately used on platforms other than WASM. I am currently working on a rebuild of the system which should work better than the current version (and probably better than asyncify). |
I see, thanks @jaddr2line ! Sounds irrelevant then, closing... |
It probably makes more sense to switch to asyncify now. Asyncify is more mature, most other platforms have switched to stack-based goroutines. Additionally the previously mentioned rebuild of the coroutines system didnt happen. |
I have investigated this, and in fact looked into the feasibility of reimplementing an Asyncify-like pass in Go (to avoid a large JavaScript dependency) but there is one very big downside: it destroys debug information. This is important, see #1994 for why. Unless Asyncify learns to retain debug information, I'm afraid it is a blocker. |
Binaryen's Asyncify pass does support debug information, both the Names section as well as DWARF. We don't update all parts of DWARF yet (like variable locations in wasm locals), but we do update the lines section and so things like stack traces, as discussed in #1994, should remain 100% correct. As I mentioned when I opened this issue, I'd be very happy to help out here with Asyncify if there's interest! But I do get that adding Binaryen as a dependency has downsides. @aykevl is that what you mean by "a large JavaScript dependency"? Binaryen can also be compiled to other things, including wasm, native code, and to C. (I suspect we could also get it to compile to Go... with some effort on generalizing wasm2c.) |
Oh wow, that is very good news! In that case, it would become a real possibility. Yes, a Go dependency would definitely be preferred. C would also work with CGo. I would strongly wish to avoid a runtime dependency and preferably also wish to avoid a compile time dependency. However, if it's possible to drop a few C files in a package with some Go wrapper code, that would be perfect. I haven't looked much into what would be needed. In short, for other ISAs there is a function to switch between goroutines (tinygo_swapTask) and a function that sets up the state for a first goroutine (tinygo_startTask). I'm a bit tired right now. I'll try to better describe this soon. |
Interesting! Hopefully this is useful then. Eager to hear more details of what you have in mind @aykevl About integration, we can emit a single C file of all of Binaryen, either as an executable or a library. It doesn't run as fast as a normal build (no multithreading), but it's easy to distribute. For the initial testing though it might be easier to use a native build. There would not be any runtime dependency. Just compile time. (However, your existing runtime would need some code to integrate with Asyncify, that is, to call unwind/rewind at the proper times etc. I assume that would be small.) |
I am a bit confused, I don't think we need anything other than |
Yes, at compile time just At runtime, just calling the intrinsics |
This was a bit different from a stack-switching system, so I did not try to merge it with the
I was originally planning to place the asyncify stack on top of the C stack, since no C stack pushes/pops should be happening while the asyncify stack is in use. However, since these grow in opposite directions (C stack grows down, asyncify stack grows up), I instead placed these inside the same buffer such that they grow inwards. I am not entirely sure what to do with the GC right now. Most likely, this will need to be done by modifying the stack slots system to save the linked list when we switch goroutines. I was hoping we could use this to get conservative stack scanning to work (stack slots have some bugs), but that seems like it would require an extra stack to run the GC. The WASM assembly to handle the unwinding and rewinding seems a bit complicated. Right now I copied the direct-call-to-start technique used by the |
@niaow thank you for looking into this! Regarding how to include this in TinyGo, I was referring to the TinyGo build (not using TinyGo to build wasm binaries). If possible, I would very much prefer if users won't need to install extra tools to use TinyGo. Requiring extra tools causes way too many headaches, see @kripken From a quick look I see that Binaryen is a pretty big C++ program. Is there a way to link to it as a library, via C bindings? That would make it much easier to link it statically into the TinyGo binary for releases.
Oops, I assumed Binaryen was written in JavaScript. Clearly that's not the case, which makes distribution a lot easier. @niaow I see that you are looking into Asyncify for TinyGo, which is great! However, I wonder if it wouldn't be easier to first sort out how to add Binaryen as a TinyGo dependency? For example, I think the optimizer might be useful in and of itself to reduce code size and possibly improve speed (although I haven't tested this) - right now we're just using LLVM. It would certainly be a lot simpler to review if these two things are split. |
Yes, Binaryen has a C API and can be built as a library (that's the default actually in CMake). |
Yeah these two things can be split, I was mostly trying to just see how to use it inside of TinyGo's runtime. And yeah, the C API doesn't seem too bad. |
Awesome!
I understand. Oh and one final note:
I was actually thinking the same thing: asyncify and GC stack scanning both need to have some sort of stack access, so they might work well together. My initial thought of how this could work is as follows:
So essentially the first case is a subset of the second case. The difference here with current Asyncify is that it saves everything to the C stack instead of to a custom buffer, which would make it much more similar to other instruction sets and thus easier to integrate in TinyGo (and perhaps other tools). It's very well possible that I've overlooked some major issues, but if I were to design Asyncify I would have first tried a design like this. |
Asyncify doesn't use the C stack directly mainly since the C stack isn't part of wasm, it's just a toolchain convention, and some languages or implementations might not have such a thing. So Asyncify just gets a pointer to a buffer for it to use. But you could allocate that buffer on your C stack if that makes sense. One downside to doing so is that then you'd be copying the Asyncify buffer when you copy the C stack (which I assume you need to do with multiple executions in flight). Overall what you said sounds right to me. In both GC and stack switching the wasm locals need to reach linear memory so they can be scanned. One option is to have two mechanisms, one for GC, and for stack switching. They would write to different places (wherever the GC one writes to, and Asyncify will write to the buffer you point it to), and there would be some extra overhead due to spilling some locals twice. But that might be fine, and seems simple. Another option might be to only GC when nothing is running. That is, right before a GC, pause the current execution. Then all the current executions will have been serialized into Asyncify buffers, and you can scan those (conservatively), then GC, then resume execution. (This btw is what |
Are you saying that Asyncify ignores the C stack pointer entirely? That seems unsafe to me, any program that would use Asyncify and also uses the C stack pointer (which would probably also include Rust and definitely includes TinyGo) would corrupt the C stack when unwinding/rewinding the stack. Maybe not when using it to implement suspending, but certainly when running multiple async tasks at the same time (unwinding one, rewinding the other). Take a look at this innocent-looking piece of code for example: https://godbolt.org/z/qGaczasj4
Wow, that might actually work for the TinyGo GC! |
So there are 2 things called asyncify: a transformation pass and a library. The asyncify pass does not care about the C |
Additionally for GC: The main thing I am concerned about is GC when outside of a goroutine. During the entry point or when handling an event, we spawn goroutines (which allocate memory and can cause a GC). In order for this effectively-system-stack to be scanned properly, we would need to wrap it in a function which can be rewound to, and provide a buffer to rewind into. At the moment it looks like everything should continue functioning if the scheduler and |
It is unsafe to ignore the C stack, definitely, for C++, C, Rust, etc. But Asyncify leaves that to the runtime - Asyncify just handles pausing and resuming the wasm stack. A runtime can use Asyncify and in a modular way also handle the C stack if it has one, and other things, on top. Basically like @niaow 's link - nice! (In theory we could add an option for Asyncify to also handle the userspace stack, but then we'd need to be told how that stack works: which global it uses, does it go up or down, what are the size limits on it, is it contiguous or not, etc. - not everything uses the current LLVM conventions for the userspace stack.) |
Ah, I see, that makes sense. I didn't get this from the Asyncify documentation. I might have missed this, but if I haven't, it might be worth adding to the docs (especially as it talks a lot about C++ which uses the C stack). |
I think the request was fulfilled in the new release, so closing. Please reopen if needed. Thanks! |
Context: @aykevl wrote:
The discussion there on new wasm capabilities is the right long-term solution, of course, but I wonder if maybe until then, Asyncify might be useful here?
With Asyncify you run a utility on a wasm file, and get a modified wasm file that can pause and resume execution at various points. It's easy to use - you don't need to think about how it works or handle corner cases, just tell it which calls can pause, and set up a little runtime code to call things. This does add overhead, about 50% in size and speed on average in the worst cases (where it ends up instrumenting almost everything), but in the best cases it's very efficient - for example, it can avoid instrumenting inner loops where possible, letting such code run at full speed.
The idea here would be to leave many of the low-level goroutine pause/resume details to Asyncify. All TinyGo would do is have some special function call (which you don't need to implement; it's a marker for Asyncify) that indicates a pause/resume point. Asyncify is run on the wasm. Then when the wasm runs, a goroutine that reaches such a point can return to the runtime code, which can then decide which other goroutine to proceed to run.
If there's interest I'd be happy to collaborate on investigating this! (I wrote Asyncify, and have helped integrate it in various projects, but I don't know that much about TinyGo.)
The text was updated successfully, but these errors were encountered: