Skip to content

Cache wasmTable.get() #14534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed

Cache wasmTable.get() #14534

wants to merge 8 commits into from

Conversation

kripken
Copy link
Member

@kripken kripken commented Jun 24, 2021

This is an alternative to #13844, and is aimed at fixing the performance issues in
#12733 and #13796 I spent some time on this now as @juj may not be available
to work on that PR for a while.

This wraps around our wasmTable instance with a simple cache of values. It
lets us avoid calling wasmTable.get on the same index more than once. This
depends on the table only being modified from JS, which is true for now at least
in non-relocatable mode.

This enables the caching in optimized builds not focused on size. But it does
make them larger, so perhaps this should not be on by default - thoughts?

Testing locally, this makes the benchmark from #12733 4x faster as expected.
@jpharvey perhaps you can verify if this helps your case as well?

@kripken kripken requested a review from sbc100 June 24, 2021 23:05
@@ -2002,6 +2002,11 @@ def check_memory_setting(setting):
settings.MINIFY_WASM_IMPORTS_AND_EXPORTS = 1
settings.MINIFY_WASM_IMPORTED_MODULES = 1

# Cache the wasm table for speed in builds not optimizing for size. We also
# cannot cache when using dynamic linking.
settings.CACHE_WASM_TABLE = not settings.RELOCATABLE and \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can probably make it work for RELOCATABLE too because the table mods are all centrally performed. Happy to see this land as is though.

kleisauke referenced this pull request in hoodmane/libffi-emscripten Jun 28, 2021
@jpharvey
Copy link

jpharvey commented Jul 1, 2021

@kripken this seems to fix our performance problems.
Thank you

John

@sbc100
Copy link
Collaborator

sbc100 commented Jul 1, 2021

Lets get this landed!

@kripken
Copy link
Member Author

kripken commented Jul 1, 2021

Thanks @jpharvey , good to know this works as expected!

I am unsure we should land it as is, as it does increase code size. Another option might be to add a flag for it. Even better, though, would be to enable it in use cases that need it - when is that? That is, where do we actually have heavy usage of indirect calls from JS?

Offhand, I can only think of C++ exceptions and setjmp (asyncify does it too, but at low frequency). Is there more?

@jpharvey
Copy link

@kripken This seemed to get really close but stalled again. Anything i can do to help get this landed. It would be really good for us to move to a newer version of emscripten. I think our problem is caused by C++ exceptions so enabling it for that would work for us, or i could live with a setting.

@kripken
Copy link
Member Author

kripken commented Aug 26, 2021

Sorry for the delay @jpharvey

@juj , do you have thoughts on this? This adds a little size in builds not optimizing for size, but it does speed up table calls in return for that.

@kripken
Copy link
Member Author

kripken commented Sep 10, 2021

gentle ping @juj

If there are no strong opinions here then I can add a new option for this, which @jpharvey said would be good enough, and is least risky I think.

@kripken
Copy link
Member Author

kripken commented Sep 17, 2021

Ok, let's move forward and land this then. How about just making it on by default in -O3+, and not -O2+? That seems a little safer to me, and a compromise I can live with. Saves adding a new user flag.

var originalWasmTable = wasmTable;
wasmTable = {
_cache: [],
'length': wasmTable.length,
Copy link
Collaborator

@juj juj Sep 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasmTable.length looks wrong here? Should this be this._cache.length instead?

(or alternatively if I was writing this golfing for code size, I'd write in wasmTable._cache.length to make Closure win three bytes replacing this with a likely single-letter variable, but maybe that feels a bit conceptually dirty for the object to access itself via outside scope name, and could be confusing given the monkeypatch replacement above - I'd be fine either way)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, looks like this should be 'length': originalWasmTable.length?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe best would be get 'length'() { return originalWasmTable.length }, then the implementation of grow could simply be:

'grow': function(by) {
     return originalWasmTable.grow(by);
}

_cache: [],
'length': wasmTable.length,
'get': function(i) {
if (i >= wasmTable._cache.length) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing var c = this._cache; or var c = wasmTable._cache at the top (if not caring about this correctness), you can save code size on multiple references on wasmTable._cache below. (Closure won't do this kind of outlining)

Copy link
Collaborator

@juj juj Sep 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, there is also a bug that

if (i >= wasmTable._cache.length) {
  wasmTable._cache.length = i;
}

produced an array one index too small, so it should assign i+1 to length? (maybe flip the check around to clearly visually read it at a glance)

if (existing) {
return existing;
}
return wasmTable._cache[i] = originalWasmTable.get(i);
Copy link
Collaborator

@juj juj Sep 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

'get': function(i) {
  var c = wasmTable._cache, existing = c[i];
  if (!existing) {
    if (c.length <= i) c.length = i + 1;
    existing = c[i] = originalWasmTable.get(i);
  }
  return existing;
}

to optimize to a single branch on fast path, and reduce code size?

}
wasmTable._cache[i] = func;
return originalWasmTable.set(i, func);
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The set function is not supposed to return anything, so the return statement is redundant?

This could be condensed to

'set': function(i, func) {
  if (wasmTable._cache.length <= i) wasmTable._cache.length = i + 1;
  originalWasmTable.set(i, wasmTable._cache[i] = func);
}

@juj
Copy link
Collaborator

juj commented Sep 18, 2021

Sorry for the delay in reviewing.

I do think it would be a better solution to not polyfill over the WebAssembly table object, because 1) such polyfills are unnecessarily verbose and won't closure minify well, and 2) I do not think that the wasmTable object should be part of the public API that end users (JS library, extrenal JS, EM_ASM and EM_JS code block authors) should be accessing to invoke function pointers.

The idea of wbind() function was to abstract away direct accesses to wasmTable.get() - so the caching could naturally be placed there. That kind of abstraction would allow future refactors in the runtime internals without risking public API breakages (like #12733), hence not tying our hands to needing to implement the WebAssembly Table API to end users (does anyone call wasmTable.length that our polyfill needs to provide it?)

Also if Wasm Table API changes/expands in the future, our polyfill would not go out of date, and/or be handcuffed to have to gain code creep.

Note that {{{ makeDynCaller(...) }}} is not available in external JS, EM_ASM and EM_JS code blocks, so that would not be a sufficient public API implementation.

@kripken
Copy link
Member Author

kripken commented Sep 22, 2021

@juj Would you have time to finish up the wbind work? If it's a more general solution then it sounds good to me.

If not, or if not in the near future, I think this would still be a good incremental step. It solves current user problems, and it is a pretty simple and non-invasive mechanism. While polyfilling like this has downsides, it does guarantee that all locations in the codebase get the benefit of the new optimization automatically.

(Not going through review comments yet until we decide on a direction.)

@jpharvey For now, I think you can place

var originalWasmTable = wasmTable;
wasmTable = {
  _cache: [],
  'length': wasmTable.length,
  'get': function(i) {
    if (i >= wasmTable._cache.length) {
      wasmTable._cache.length = i;
    }
    var existing = wasmTable._cache[i];
    if (existing) {
      return existing;
    }
    return wasmTable._cache[i] = originalWasmTable.get(i);
  },
  'set': function(i, func) {
    if (i >= wasmTable._cache.length) {
      wasmTable._cache.length = i;
    }
    wasmTable._cache[i] = func;
    return originalWasmTable.set(i, func);
  },
  'grow': function(by) {
    var ret = originalWasmTable.grow(by);
    wasmTable.length = originalWasmTable.length;
    return ret;
  }
};

in a JS file that you add using --pre-js.

@juj
Copy link
Collaborator

juj commented Sep 23, 2021

@juj Would you have time to finish up the wbind work? If it's a more general solution then it sounds good to me.

Sure, let me get that PR updated to the finish line.

@kripken
Copy link
Member Author

kripken commented Oct 18, 2021

Closing as #15286 landed, which should fix the same perf issues (but in a different way internally).

@kripken kripken closed this Oct 18, 2021
@kripken kripken deleted the cache_table branch October 18, 2021 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants