-
Notifications
You must be signed in to change notification settings - Fork 792
[NFC] Speed up Unsubtyping #7734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Speed up Unsubtyping by over 2x via algorithmic and other improvements. The most expensive part of the Unsubtyping analysis is the handling of casts. For each cast source and destination pair, each type that remains a subtype of the source and was originally a subtype of the destination must remain a subtype of the destination for the cast to continue succeeding. Previously, Unsubtyping analyzed these cast relationships for all types as a single unit of work whenever it reached a fixed point from examining other sources of subtyping constraints. This led to duplicated work because the subtype, cast source, and cast destination triples analyzed once would be analyzed again the next time casts were considered. Avoid this duplicated cast analysis by incrementally analyzing casts whenever a new subtyping is discovered. Maintain the invariant that each new subtyping either joins a subtyping tree rooted at the discovered subtype into the discovered supertype's tree, or reparents the subtype below some (possibly indirect) subtype of its old parent. In the former case, the subtype and all of its descendents are evaluated against all casts originating from all their new supertypes in the tree they have joined. In the latter case, they must already have been evaluated against all casts originating at the old supertype and its ancestors, so they need only to be evaluated against their new supertypes up to the old supertype. Once a particular type is evaluated against casts originating from a particular supertype, that type will never be evaluated against those casts again. This algorithmic improvement accounts for most of the speedup. The rest of the speedup is from doing less work while collecting the initial subtyping constraints in a parallel function analysis. The old implementation used an instance of Unsubtyping to collect constraints in each function, which would end up doing some analysis to find additional constraints. The new implementation does not do any analysis of transitively required constraints during the initial parallel function analysis.
2ff8dd1
to
038bb62
Compare
Nice speedup! What is the absolute time this pass takes? I'm curious if it remains one of our slower passes or not. |
src/passes/Unsubtyping.cpp
Outdated
Subtypes subtypes(HeapType type) { return {this, getNode(type)}; } | ||
|
||
private: | ||
Index getNode(HeapType type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps getIndex
?
src/passes/Unsubtyping.cpp
Outdated
auto subIndex = getNode(sub); | ||
auto superIndex = getNode(super); | ||
auto& childNode = nodes[subIndex]; | ||
auto& parentNode = nodes[superIndex]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps subNode
, superNode
?
(or use parent/child before instead of sub/super)
src/passes/Unsubtyping.cpp
Outdated
// Swap the indices in the parent's child vector. | ||
std::swap(children[childNode.indexInParent], children.back()); | ||
// Swap the indices in the children. | ||
std::swap(childNode.indexInParent, swappedNode.indexInParent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::swap(childNode.indexInParent, swappedNode.indexInParent); | |
swappedNode.indexInParent = childNode.indexInParent; |
This is overwritten below anyhow
src/passes/Unsubtyping.cpp
Outdated
auto& [index, childIndex] = stack.back(); | ||
if (childIndex == parent->nodes[index].children.size()) { | ||
stack.pop_back(); | ||
continue; | ||
} | ||
break; | ||
} | ||
auto& [index, childIndex] = stack.back(); | ||
auto child = parent->nodes[index].children[childIndex]; | ||
++childIndex; | ||
stack.push_back({child, 0u}); | ||
return *this; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto& [index, childIndex] = stack.back(); | |
if (childIndex == parent->nodes[index].children.size()) { | |
stack.pop_back(); | |
continue; | |
} | |
break; | |
} | |
auto& [index, childIndex] = stack.back(); | |
auto child = parent->nodes[index].children[childIndex]; | |
++childIndex; | |
stack.push_back({child, 0u}); | |
return *this; | |
auto& [index, childIndex] = stack.back(); | |
auto& children = parent->nodes[index].children; | |
if (childIndex == children.size()) { | |
stack.pop_back(); | |
} else { | |
auto child = children[childIndex]; | |
++childIndex; | |
stack.push_back({child, 0u}); | |
return *this; | |
} | |
} |
Doing it all in the loop is shorter since it can reuse some things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
src/passes/Unsubtyping.cpp
Outdated
void noteSubtype(HeapType sub, HeapType super) { | ||
if (sub == super || sub.isBottom() || super.isBottom()) { | ||
size_t noteCount = 0; | ||
void note(HeapType sub, HeapType super) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous name noteSubtype
helped remember the order of the parameters? (parallel to isSubtype)
: ControlFlowWalker<Collector, SubtypingDiscoverer<Collector>> { | ||
Info& info; | ||
Collector(Info& info) : info(info) {} | ||
void noteSubtype(Type sub, Type super) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is generic - makes me wonder if maybe we should put it in the shared header as the default? I guess considering that could be separate from this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think there's room to simplify this, especially since Unsubtyping is back to being the only user of SubtypingDiscoverer
. (StringLowering previously used it, but no longer needs to now that string is a subtype of extern.)
src/passes/Unsubtyping.cpp
Outdated
for (auto type : ModuleUtils::getPublicHeapTypes(wasm)) { | ||
if (auto super = type.getDeclaredSuperType()) { | ||
noteSubtype(type, *super); | ||
size_t processCount = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t processCount = 0; | |
size_t processCount = 0; | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is actually left over from performance debugging.
src/passes/Unsubtyping.cpp
Outdated
continue; | ||
void | ||
processCasts(HeapType sub, HeapType super, std::optional<HeapType> oldSuper) { | ||
// We are either attaching the one tree rooted at `type` under a new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type
is not one of the parameters - does it refer to sub
or super
perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, yes. I will update this to sub
.
On a calcworker binary that has been through a single So it's still on the slower side, but it's no longer one of the few slowest passes. |
auto childIndex = getIndex(sub); | ||
auto parentIndex = getIndex(super); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto childIndex = getIndex(sub); | |
auto parentIndex = getIndex(super); | |
auto subIndex = getIndex(sub); | |
auto superIndex = getIndex(super); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is, I think it would be good to consistently use sub/super or parent/child, but not mix them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I've standardized on parent/child
when naming indices and nodes in the tree. The parameters are still sub
and super
because they are types rather than tree elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, sgtm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm % comment
Speed up Unsubtyping by over 2x via algorithmic and other improvements.
The most expensive part of the Unsubtyping analysis is the handling of
casts. For each cast source and destination pair, each type that remains
a subtype of the source and was originally a subtype of the destination
must remain a subtype of the destination for the cast to continue
succeeding.
Previously, Unsubtyping analyzed these cast relationships for all types
as a single unit of work whenever it reached a fixed point from
examining other sources of subtyping constraints. This led to duplicated
work because the subtype, cast source, and cast destination triples
analyzed once would be analyzed again the next time casts were
considered.
Avoid this duplicated cast analysis by incrementally analyzing casts
whenever a new subtyping is discovered. Maintain the invariant that each
new subtyping either joins a subtyping tree rooted at the discovered
subtype into the discovered supertype's tree, or reparents the subtype
below some (possibly indirect) subtype of its old parent. In the former
case, the subtype and all of its descendents are evaluated against all
casts originating from all their new supertypes in the tree they have
joined. In the latter case, they must already have been evaluated
against all casts originating at the old supertype and its ancestors, so
they need only to be evaluated against their new supertypes up to the
old supertype. Once a particular type is evaluated against casts
originating from a particular supertype, that type will never be
evaluated against those casts again.
This algorithmic improvement accounts for most of the speedup. The rest
of the speedup is from doing less work while collecting the initial
subtyping constraints in a parallel function analysis. The old
implementation used an instance of Unsubtyping to collect constraints in
each function, which would end up doing some analysis to find additional
constraints. The new implementation does not do any analysis of
transitively required constraints during the initial parallel function
analysis.