-
Notifications
You must be signed in to change notification settings - Fork 786
Performance optimizations for Type #2733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Cache type sizes in unused bits from the type ID and rewrite some Type methods to avoid unnecessary calls to `expand` when the type is known to be a basic non-tuple type. This eliminates most of the locking traffic and reduces wall clock time by 52% and CPU time by 73% percent for one real-world program on my machine.
src/wasm-type.h
Outdated
|
||
// But converting raw uint32_t is more dangerous, so make it explicit | ||
constexpr explicit Type(uint32_t id) : id(id){}; | ||
constexpr explicit Type(uint32_t id) : id(id), _size(unknownSize){}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the size unknown here, mustn't the id be valid? if so shouldn't we get the size from the caller too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to avoid taking the lock to look up the size, but idk how important that is. I'll investigate further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that this constructor is primarily used in the C API to convert from BinaryenType
aka uint32_t
to a Type. Right now BinaryenType
values are type IDs, but we could pack the size into them as well. The downside of that would be that it would make the conversion from BinaryenType
to Type
even more dangerous and error prone and generally add a maintenance burden to the C API. I think it's probably better to take the lock and look up the size in the constructor.
src/wasm-type.h
Outdated
@@ -24,7 +24,11 @@ | |||
namespace wasm { | |||
|
|||
class Type { | |||
uint32_t id; | |||
// enough for the limit of 1000 function arguments | |||
static constexpr unsigned sizeBits = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps SizeBits
(capitalize?) or SIZE_BITS
? we seem to use both of those, and to leave lowerCaps
style names for actual variables.
src/wasm/wasm-type.cpp
Outdated
|
||
size_t Type::size() const { | ||
if (_size == unknownSize) { | ||
return expand().size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the bad old path? i hope we can avoid it as mentioned in another comment, by requiring the size to always be known.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion, this seems ok for now but we should figure this out later.
return params[index]; | ||
auto numParams = sig.params.size(); | ||
if (index < numParams) { | ||
return sig.params.expand()[index]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this also seems like a bad path that we should avoid, as each expand takes a lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no way to avoid this right now because fetching an element out of a Type requires accessing the underlying vector. Thread-local caches could help, but that would be future work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
This is promising, but if we can't get rid of all the bad paths (and only take a lock if doing something known to be very rare), then I'm worried. Your cache of types idea is very important for avoiding passing around copies everywhere, so we should definitely keep that. How about making the cache per-thread? Maybe we should gather info on how large the types cache is commonly, I'm hoping it's small enough for such duplication. |
Here's some more detailed performance data. Before patch 72 cores 1 core real ratio: 1.64 Patch applied 72 cores 1 core real ratio: 3.29 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, and we should remove locking entirely in a followup for the common (non-multivalue) case.
src/wasm/wasm-type.cpp
Outdated
@@ -120,7 +120,7 @@ void Type::init(const std::vector<Type>& types) { | |||
if (lookup()) { | |||
return; | |||
} | |||
if (typeLists.size() >= (1 << (32 - sizeBits))) { | |||
if (typeLists.size() >= (1 << (ID_BITS))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra unnecessary parens here around ID_BITS
src/wasm/wasm-type.cpp
Outdated
|
||
size_t Type::size() const { | ||
if (_size == unknownSize) { | ||
return expand().size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion, this seems ok for now but we should figure this out later.
@kripken I ended up removing the UNKNOWN_SIZE thing after all as I described in #2733 (comment). Still LGTY? |
still lgtm, sure. |
I think I floated this once in #2510 (comment); would overriding |
Hmm yeah I think that's worth revisiting now. One of the advantages of just supporting |
Cache type sizes in unused bits from the type ID and rewrite some Type
methods to avoid unnecessary calls to
expand
when the type is knownto be a basic non-tuple type. This eliminates most of the locking
traffic and reduces wall clock time by 52% and CPU time by 73% percent
for one real-world program on my machine.