Skip to content

Restore the dynCall() and dynCall_sig() API into the build #13296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jan 27, 2021

Conversation

juj
Copy link
Collaborator

@juj juj commented Jan 21, 2021

Restore the dynCall() and dynCall_sig() API into the build when -s USE_LEGACY_DYNCALLS=1 is passed. Add support for dynCall() in MINIMAL_RUNTIME builds (that was not implemented before).

This makes -s USE_LEGACY_DYNCALLS=1 a user-facing option.

juj added 2 commits January 21, 2021 16:23
…E_LEGACY_DYNCALLS=1 is passed. Add support for dynCall() in MINIMAL_RUNTIME builds (that was not implemented before).
@juj
Copy link
Collaborator Author

juj commented Jan 21, 2021

One question I have is whether we should rename the setting USE_LEGACY_DYNCALLS to WASM_DYNCALLS?

@sbc100
Copy link
Collaborator

sbc100 commented Jan 21, 2021

Looks good to me. I agree we could rename this option. If we are going to pick a new name I would go with just DYNCALLS or GENERATE_DYNCALLS?

If its not too annoying would you mind splitting out the MINIMAL_RUNTIME part? Is there a reason you chose a different interface for MINIMAL_RUNTIME? Why not just do dynCall_xx = asm['dynCall_xx]` just like normal functions and just like the normal runtime?

Going forward is the plan to transition from binaryen-generated dynCalls to table-base JS dynCalls? Presumably we could keep the exact same API while making this transition? I hope we can eventually remove the dynCall generation from binaryen

@juj
Copy link
Collaborator Author

juj commented Jan 21, 2021

If we are going to pick a new name I would go with just DYNCALLS or GENERATE_DYNCALLS?

Renamed to DYNCALLS.

If its not too annoying would you mind splitting out the MINIMAL_RUNTIME part?

I would rather not, if that is ok. This PR is already quite small that I hope it is well reviewable, I would like to keep the wrangling simple, and I already started splitting up that previous PR.

Is there a reason you chose a different interface for MINIMAL_RUNTIME? Why not just do dynCall_xx = asm['dynCall_xx]` just like normal functions and just like the normal runtime?

There does not exist asm['dynCall_xx'] after symbol name minification takes place. Classic runtime uses the Module export object to lookup the signature, i.e.

Module['dynCall_v'] = dynCall_v = asm['k'];
Module['dynCall_ii'] = dynCall_ii = asm['l'];
...
function dynCall(sig, ...) {
  Module['dynCall_'+sig].apply(null, ...);
}

whereas with MINIMAL_RUNTIME we do not have the Module object for exports, so it generates a dedicated dynCalls object with

var dynCall_v, dynCalls_ii, dynCalls = {};
...

dynCalls['v'] = dynCall_v = asm['k'];
dynCalls['ii'] = dynCall_ii = asm['l'];
...
function dynCall(sig, ...) {
  dynCalls[sig].apply(null, ...);
}

It does not make sense to use that structure for classic runtime, because it already has the Module object where the dynCalls are exported.

Going forward is the plan to transition from binaryen-generated dynCalls to table-base JS dynCalls?

Yes, that is the intent. When cached, table-based calls are faster than dynCalls (+61.7% faster for 'void' signatures, +81.2% for complex 'iffddii' signatures for static dispatch). We just need a good path to migrate forward from dynCalls to table-based calls. The issue here is that the change is global affecting everyone's code, so we need a way to let all JS code authors have a way to update at independent times before we drop dynCalls.

However I do notice this does have an effect that table-based lookups penalize Firefox performance on dynamic signatures (+27.7% slower vs dynamic dispatch via dynCall() on 'iffddii' signature), although even with that slowdown, Firefox is still more than three times faster than Chrome and Safari, so it's probably ok. Safari gets ~+18% faster, .

Presumably we could keep the exact same API while making this transition?

While we do the transition, I propose we keep the same API. But in general the dynCall() API is ill suited for table-based invocations, so after this lands, I'll make another PR about that new wbind() and wbindArray() API to replace dynCall().

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. I think I understand (and agree with) the direction now. Thanks for explaining.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test here could be merged with test_dyncall_specific, I think? Adding to there would be shorter. lgtm with that.

@@ -8270,6 +8270,19 @@ def test_gl_main_module(self):
self.set_setting('MAIN_MODULE')
self.do_runf(path_from_root('tests', 'core', 'test_gl_get_proc_address.c'))

def test_dyncalls(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_dyncalls(self):
@also_with_wasm_bigint
def test_dyncalls(self):

That will verify the emscripten.py changes here.

@sbc100
Copy link
Collaborator

sbc100 commented Jan 27, 2021

The current test failure in test-other is fixed on ToT so a merge/rebase should resolve it.

@juj juj merged commit 7ff719c into emscripten-core:master Jan 27, 2021
self.do_run_in_out_file_test('tests', 'core', 'test_dyncalls.c')

self.emcc_args += ['-s', 'MINIMAL_RUNTIME=1', '-s', 'DEFAULT_LIBRARY_FUNCS_TO_INCLUDE=["$dynCall"]']
self.do_run_in_out_file_test('tests', 'core', 'test_dyncalls.c')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not sure sure what is going on with this test?

Why why where these lines added to this test rather than creating a new one? This test has a loop over 3 different modes but these 2 new do_run_in_out_file_test don't seem to use the loop variables at all.

Should the first part be it own test alongside test_dyncalls_minimal_runtime below? The second part looks identical to test_dyncalls_minimal_runtime.. unless I'm missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why why where these lines added to this test rather than creating a new one?

@kripken asked me in #13296 (review) to merge these two tests together. Good point about misplacing the tests inside the loop, I'll follow up with a PR to fix.


// 2. Access a function pointer using the convenience/legacy 'dynCall' function (32-bit ABI)
// (this form should never be used, it is suboptimal for performance, but provided for legacy compatibility)
var ret = dynCall('iii', funcPtr, [2, 3]); // Available only in WASM_BIGINT != 2 builds
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe remove the comments about WASM_BIGINT == 2 (which I don't think we planning to add in the end right?)

(Sorry for the late feedback).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, sharp eye, posted #13352 to fix.

juj added a commit to juj/emscripten that referenced this pull request Jan 28, 2021
@juj juj mentioned this pull request Feb 4, 2021
juj added a commit to juj/emscripten that referenced this pull request Feb 8, 2021
juj added a commit to juj/emscripten that referenced this pull request Feb 11, 2021
juj added a commit to juj/emscripten that referenced this pull request Feb 13, 2021
juj added a commit that referenced this pull request Feb 13, 2021
* Follow up review on #13296.

* Address review. Fix Asan on MINIMAL_RUNTIME.

* Update test expectation

* flake
@sbc100
Copy link
Collaborator

sbc100 commented Mar 22, 2024

Yes, that is the intent. When cached, table-based calls are faster than dynCalls (+61.7% faster for 'void' signatures, +81.2% for complex 'iffddii' signatures for static dispatch). We just need a good path to migrate forward from dynCalls to table-based calls. The issue here is that the change is global affecting everyone's code, so we need a way to let all JS code authors have a way to update at independent times before we drop dynCalls.

However I do notice this does have an effect that table-based lookups penalize Firefox performance on dynamic signatures (+27.7% slower vs dynamic dispatch via dynCall() on 'iffddii' signature), although even with that slowdown, Firefox is still more than three times faster than Chrome and Safari, so it's probably ok. Safari gets ~+18% faster, .

@juj, I was hoping to look into removal the underlying dynCall stuff in favor of just using the wasm table.

Can you share some info about the kind of benchmark you wrote for this? Was it just a simple microbenchmark calling and empty function in a loop, or something more complex?

@sbc100
Copy link
Collaborator

sbc100 commented Mar 22, 2024

Yes, that is the intent. When cached, table-based calls are faster than dynCalls (+61.7% faster for 'void' signatures, +81.2% for complex 'iffddii' signatures for static dispatch). We just need a good path to migrate forward from dynCalls to table-based calls. The issue here is that the change is global affecting everyone's code, so we need a way to let all JS code authors have a way to update at independent times before we drop dynCalls.
However I do notice this does have an effect that table-based lookups penalize Firefox performance on dynamic signatures (+27.7% slower vs dynamic dispatch via dynCall() on 'iffddii' signature), although even with that slowdown, Firefox is still more than three times faster than Chrome and Safari, so it's probably ok. Safari gets ~+18% faster, .

@juj, I was hoping to look into removal the underlying dynCall stuff in favor of just using the wasm table.

Can you share some info about the kind of benchmark you wrote for this? Was it just a simple microbenchmark calling and empty function in a loop, or something more complex?

Also, you might have mentioned this somewhere already, but do you have users/codebases that depend heavily on the performance of dynCalls (i.e. making many dyncalls per frame?).

@juj
Copy link
Collaborator Author

juj commented Mar 26, 2024

@juj, I was hoping to look into removal the underlying dynCall stuff in favor of just using the wasm table.

Unfortunately at Unity we are not ready for this. If we'd remove dynCall support in Emscripten, we would then need to have some kind of emulation layer adopted in Unity codebase. (though this would be a bit tricky to do, since the knowledge of possible signatures come at build time)

The challenge we have at Unity is that the Module.dynCall() and dynCall_xx() APIs are in the public layer, and Unity asset store web plugin authors have been utilizing these in their shipped commercial plugins. So we cannot break their code lightly.

We have shipped the new table based mechanism API, and have prepared documentation for this change. These are in beta channels currently.

In this area btw, I find that we don't quite have documentation for makeDynCall or getWasmTableEntry in Emscripten, so the "good" APIs are currently not as discoverable as they could. I think this is something we should look to improve.

Can you share some info about the kind of benchmark you wrote for this?

Looks like the benchmark was here: #12733 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants