Handle entering the wasm via a table call in Asyncify #12088

kripken · 2020-08-31T15:30:02Z

Depends on #12081 (refactoring) and #12059. This removes the need
for the legacy dynCall approach for asyncify, by adding logic to note
and rewind calls that enter the wasm using the table.

Previously with dynCalls things were simpler as we only ever entered
the wasm via an export (to call a function pointer, we called an export
that did the call for us). To rewind table calls, add them to the stack
of called things.

This is not very optimized - it just serializes the function pointer and
params as a string. We should look into optimizing it, but that would
take a refactoring of how we store ids for rewinding etc., so I think
it's better to get it working first and look into optimizations later.

Fixes #12066

cc @sbc100 @Akaricchi

Akaricchi · 2020-09-01T14:55:19Z

Oh this is just nasty. Aside from the very idea of encoding arguments in a string being questionable at best, this further kills the already dead performance of fiber context switches...

10000000 switches in 18.223394s ~548745.198463 switches/s       ~1822 ns/switch
10000000 switches in 18.408631s ~543223.457685 switches/s       ~1840 ns/switch
10000000 switches in 18.239809s ~548251.362679 switches/s       ~1823 ns/switch
10000000 switches in 18.104195s ~552358.180668 switches/s       ~1810 ns/switch
10000000 switches in 19.524142s ~512186.406009 switches/s       ~1952 ns/switch
10000000 switches in 18.592954s ~537838.138595 switches/s       ~1859 ns/switch
10000000 switches in 17.843040s ~560442.609412 switches/s       ~1784 ns/switch
10000000 switches in 18.867718s ~530005.790910 switches/s       ~1886 ns/switch
10000000 switches in 17.893386s ~558865.709390 switches/s       ~1789 ns/switch
10000000 switches in 18.317863s ~545915.217977 switches/s       ~1831 ns/switch

These numbers are roughly 3 times worse than they are on master.

Somehow, a fully optimized Taisei build still barely manages to stay withing 60 FPS most of the time on my Ryzen 5 3600, but there are already some framerate spikes observed with ~500+ switches per frame (which really isn't that big a number). The difference is more noticeable when fast-forwarding through a replay (essentially advancing 10 game logic frames per 1 drawn frame, increasing switches/frame tenfold).

I would prefer to just keep the legacy dyncalls for Asyncify until we find a better way to resolve this that doesn't murder performance.

Akaricchi · 2020-09-01T15:01:07Z

I guess I'm a bit confused as to why we need the two separate cases for calling exports vs. dyncalls? Can't we just interpret the rewind_id as a function pointer (which is an index into the wasmTable, i assume), and store the arguments in the Asyncify data struct somewhere? This would actually fully encode the continuation in linear memory with no JS-side state, possibly allowing a suspended fiber to be resumed on a different thread.

kripken · 2020-09-01T16:20:31Z

Interesting @Akaricchi , that's definitely much worse than I expected... Ok, let's work to optimize this before landing.

I pushed a commit now that avoids the string operations and stores the data using a JS object. Hopefully that's much faster, but I don't have a benchmark myself locally - is there something simple I can try?

I guess I'm a bit confused as to why we need the two separate cases for calling exports vs. dyncalls? Can't we just interpret the rewind_id as a function pointer (which is an index into the wasmTable, i assume), and store the arguments in the Asyncify data struct somewhere? This would actually fully encode the continuation in linear memory with no JS-side state, possibly allowing a suspended fiber to be resumed on a different thread.

We could do that as well, but it would be more work, I think. Maybe I'm not seeing a nice way to do it, though - if you want to write up a PR that would be great, I'm not tied to this PR in any way!

Akaricchi

Still slower than master by some ~33%, but it's a huge improvement over the previous patch.

10000000 switches in 8.648807s  ~1156228.777901 switches/s      ~864 ns/switch
10000000 switches in 8.656814s  ~1155159.387555 switches/s      ~865 ns/switch
10000000 switches in 9.078460s  ~1101508.350890 switches/s      ~907 ns/switch
10000000 switches in 8.648500s  ~1156269.864030 switches/s      ~864 ns/switch
10000000 switches in 8.772179s  ~1139967.676429 switches/s      ~877 ns/switch
10000000 switches in 8.645461s  ~1156676.288792 switches/s      ~864 ns/switch
10000000 switches in 8.712480s  ~1147778.754405 switches/s      ~871 ns/switch
10000000 switches in 8.974158s  ~1114310.631599 switches/s      ~897 ns/switch
10000000 switches in 8.803610s  ~1135897.609328 switches/s      ~880 ns/switch
10000000 switches in 8.686977s  ~1151148.466371 switches/s      ~868 ns/switch

Perhaps more performance can be squeezed out of this by maintaining a list of free ident objects instead of allocating a new one every time. Ideally we should not be using arguments either because that's another implicit allocation, but that would require knowing the call signature and specializing the makeDynCall result by arity.

src/library_async.js

Akaricchi · 2020-09-01T17:27:17Z

Hopefully that's much faster, but I don't have a benchmark myself locally - is there something simple I can try?

You can use the microbenchmark from koishi, which is what I'm using for my measurements as well. It's dumb and not super precise, but serviceable.

Unfortunately you may need to do some one-time setup if you've never built a Meson project with Emscripten before, but it's pretty straightforward:

Create a file with the following contents, change /path/to/emscripten as needed, save as ~/.local/share/meson/cross/emscripten (or somewhere else if you're not on Linux; see here for more information):

[binaries]
c=['/path/to/emscripten/emcc']
cpp=['/path/to/emscripten/em++']
ar=['/path/to/emscripten/emar']
ranlib=['/path/to/emscripten/emranlib']
pkgconfig=['/path/to/emscripten/emconfigure', 'pkg-config']
file_packager=['/usr/bin/python3', '/path/to/emscripten/tools/file_packager.py']

[properties]
root='/data/webshit/emroot/system'
needs_exe_wrapper=true

[host_machine]
system='emscripten'
cpu_family='wasm32'
cpu='wasm32'
endian='little'

Clone koishi and set up a build directory:

git clone https://github.com/taisei-project/koishi
cd koishi
# Replace 'emscripten' with a full path to the cross file if in non-standard location
meson setup --cross-file emscripten --buildtype release build

Build and run the benchmark:

cd build
ninja
node koishi_bench

To rebuild after an emscripten change:

ninja clean && ninja

Akaricchi · 2020-09-01T17:29:35Z

We could do that as well, but it would be more work, I think. Maybe I'm not seeing a nice way to do it, though - if you want to write up a PR that would be great, I'm not tied to this PR in any way!

I'll see if i can find the time and energy for it, but no promises. I'm behind schedule on my personal projects quite a bit.

kripken · 2020-09-01T18:02:10Z

Sounds good, thanks @Akaricchi

I'll try to optimize this some more, but I suspect a different approach would be needed to really remove any new overhead. Specializing by arity as you said might help - not just for allocations, but it would help the JIT stay monomorphic. But that would have some cost in code size though.

Long-term I agree it would be good to write the arguments in memory etc., which would also be thread-safe. I don't think that's urgent though, and we should see that it doesn't add too much complexity.

curiousdannii · 2020-09-14T01:28:52Z

If you're looking at nanosecond differences in performance, switching to Map instead of an object might help. Maps can be up to twice as fast for reads. But, it's only a small part of the whole cost.

stale · 2022-04-16T13:21:46Z

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.

kripken added 2 commits August 29, 2020 10:26

Simplify asyncify logic for calling back in

83116fe

[ci skip]

51a4ef5

Base automatically changed from async1 to master August 31, 2020 17:16

kripken added 3 commits August 31, 2020 11:10

Merge remote-tracking branch 'origin/master' into async2

f68ac70

fix typo

f5cbd2c

Stop using legacy dyncalls

2e2d6ff

kripken added 2 commits September 1, 2020 09:12

Better?

6fec8a5

Merge remote-tracking branch 'origin/master' into async2

23037bb

Akaricchi reviewed Sep 1, 2020

View reviewed changes

src/library_async.js Outdated Show resolved Hide resolved

src/library_async.js Outdated Show resolved Hide resolved

Feedback

a5586ed

Base automatically changed from master to main March 8, 2021 23:49

stale bot added the wontfix label Apr 16, 2022

stale bot closed this Jul 31, 2022

sbc100 deleted the async2 branch March 7, 2023 19:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handle entering the wasm via a table call in Asyncify #12088

Handle entering the wasm via a table call in Asyncify #12088

Uh oh!

kripken commented Aug 31, 2020 •

edited

Loading

Uh oh!

Akaricchi commented Sep 1, 2020 •

edited

Loading

Uh oh!

Akaricchi commented Sep 1, 2020 •

edited

Loading

Uh oh!

kripken commented Sep 1, 2020

Uh oh!

Akaricchi left a comment

Uh oh!

Uh oh!

Uh oh!

Akaricchi commented Sep 1, 2020

Uh oh!

Akaricchi commented Sep 1, 2020

Uh oh!

kripken commented Sep 1, 2020

Uh oh!

curiousdannii commented Sep 14, 2020

Uh oh!

stale bot commented Apr 16, 2022

Uh oh!

Uh oh!

Handle entering the wasm via a table call in Asyncify #12088

Handle entering the wasm via a table call in Asyncify #12088

Uh oh!

Conversation

kripken commented Aug 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Akaricchi commented Sep 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Akaricchi commented Sep 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kripken commented Sep 1, 2020

Uh oh!

Akaricchi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Akaricchi commented Sep 1, 2020

Uh oh!

Akaricchi commented Sep 1, 2020

Uh oh!

kripken commented Sep 1, 2020

Uh oh!

curiousdannii commented Sep 14, 2020

Uh oh!

stale bot commented Apr 16, 2022

Uh oh!

Uh oh!

kripken commented Aug 31, 2020 •

edited

Loading

Akaricchi commented Sep 1, 2020 •

edited

Loading

Akaricchi commented Sep 1, 2020 •

edited

Loading