Skip to content

Performance optimizations for Type #2733

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 9, 2020
Merged

Conversation

tlively
Copy link
Member

@tlively tlively commented Apr 8, 2020

Cache type sizes in unused bits from the type ID and rewrite some Type
methods to avoid unnecessary calls to expand when the type is known
to be a basic non-tuple type. This eliminates most of the locking
traffic and reduces wall clock time by 52% and CPU time by 73% percent
for one real-world program on my machine.

Cache type sizes in unused bits from the type ID and rewrite some Type
methods to avoid unnecessary calls to `expand` when the type is known
to be a basic non-tuple type. This eliminates most of the locking
traffic and reduces wall clock time by 52% and CPU time by 73% percent
for one real-world program on my machine.
@tlively tlively requested a review from kripken April 8, 2020 22:09
src/wasm-type.h Outdated

// But converting raw uint32_t is more dangerous, so make it explicit
constexpr explicit Type(uint32_t id) : id(id){};
constexpr explicit Type(uint32_t id) : id(id), _size(unknownSize){};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the size unknown here, mustn't the id be valid? if so shouldn't we get the size from the caller too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to avoid taking the lock to look up the size, but idk how important that is. I'll investigate further.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that this constructor is primarily used in the C API to convert from BinaryenType aka uint32_t to a Type. Right now BinaryenType values are type IDs, but we could pack the size into them as well. The downside of that would be that it would make the conversion from BinaryenType to Type even more dangerous and error prone and generally add a maintenance burden to the C API. I think it's probably better to take the lock and look up the size in the constructor.

src/wasm-type.h Outdated
@@ -24,7 +24,11 @@
namespace wasm {

class Type {
uint32_t id;
// enough for the limit of 1000 function arguments
static constexpr unsigned sizeBits = 10;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps SizeBits (capitalize?) or SIZE_BITS? we seem to use both of those, and to leave lowerCaps style names for actual variables.


size_t Type::size() const {
if (_size == unknownSize) {
return expand().size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the bad old path? i hope we can avoid it as mentioned in another comment, by requiring the size to always be known.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion, this seems ok for now but we should figure this out later.

return params[index];
auto numParams = sig.params.size();
if (index < numParams) {
return sig.params.expand()[index];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also seems like a bad path that we should avoid, as each expand takes a lock?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no way to avoid this right now because fetching an element out of a Type requires accessing the underlying vector. Thread-local caches could help, but that would be future work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@kripken
Copy link
Member

kripken commented Apr 8, 2020

This is promising, but if we can't get rid of all the bad paths (and only take a lock if doing something known to be very rare), then I'm worried.

Your cache of types idea is very important for avoiding passing around copies everywhere, so we should definitely keep that. How about making the cache per-thread? Maybe we should gather info on how large the types cache is commonly, I'm hoping it's small enough for such duplication.

@tlively
Copy link
Member Author

tlively commented Apr 8, 2020

Here's some more detailed performance data.

Before patch

72 cores
real: 16.662
user: 707.308

1 core
real: 27.465
user: 27.218

real ratio: 1.64

Patch applied

72 cores
real: 7.966
user: 185.133

1 core
real: 26.230
user: 25.810

real ratio: 3.29

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, and we should remove locking entirely in a followup for the common (non-multivalue) case.

@@ -120,7 +120,7 @@ void Type::init(const std::vector<Type>& types) {
if (lookup()) {
return;
}
if (typeLists.size() >= (1 << (32 - sizeBits))) {
if (typeLists.size() >= (1 << (ID_BITS))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra unnecessary parens here around ID_BITS


size_t Type::size() const {
if (_size == unknownSize) {
return expand().size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion, this seems ok for now but we should figure this out later.

@tlively
Copy link
Member Author

tlively commented Apr 9, 2020

@kripken I ended up removing the UNKNOWN_SIZE thing after all as I described in #2733 (comment). Still LGTY?

@kripken
Copy link
Member

kripken commented Apr 9, 2020

still lgtm, sure.

@aheejin
Copy link
Member

aheejin commented Apr 9, 2020

I think I floated this once in #2510 (comment); would overriding [] operator and moving code to "check if this is isSingle() or not and if not call expand()" in there be cleaner? My motivation then was to hide expand() entirely so that it is only an implementation detail, but if there are places this checking is not necessary, we can still expose expand() and explicitly use it.

@tlively
Copy link
Member Author

tlively commented Apr 9, 2020

Hmm yeah I think that's worth revisiting now. One of the advantages of just supporting expand() before was that you could expand once to get both the size and perform accesses but only take the lock once up front. Now getting the size is free anyway, so that's less important. You're also right that by special casing the basic types in some internal logic we can make it easier to use Type performantly.

@tlively tlively merged commit b3d79f4 into WebAssembly:master Apr 9, 2020
@tlively tlively deleted the optimize-type branch April 24, 2020 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants