Skip to content

[NFC] Speed up Unsubtyping #7734

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 21, 2025
Merged

[NFC] Speed up Unsubtyping #7734

merged 5 commits into from
Jul 21, 2025

Conversation

tlively
Copy link
Member

@tlively tlively commented Jul 18, 2025

Speed up Unsubtyping by over 2x via algorithmic and other improvements.

The most expensive part of the Unsubtyping analysis is the handling of
casts. For each cast source and destination pair, each type that remains
a subtype of the source and was originally a subtype of the destination
must remain a subtype of the destination for the cast to continue
succeeding.

Previously, Unsubtyping analyzed these cast relationships for all types
as a single unit of work whenever it reached a fixed point from
examining other sources of subtyping constraints. This led to duplicated
work because the subtype, cast source, and cast destination triples
analyzed once would be analyzed again the next time casts were
considered.

Avoid this duplicated cast analysis by incrementally analyzing casts
whenever a new subtyping is discovered. Maintain the invariant that each
new subtyping either joins a subtyping tree rooted at the discovered
subtype into the discovered supertype's tree, or reparents the subtype
below some (possibly indirect) subtype of its old parent. In the former
case, the subtype and all of its descendents are evaluated against all
casts originating from all their new supertypes in the tree they have
joined. In the latter case, they must already have been evaluated
against all casts originating at the old supertype and its ancestors, so
they need only to be evaluated against their new supertypes up to the
old supertype. Once a particular type is evaluated against casts
originating from a particular supertype, that type will never be
evaluated against those casts again.

This algorithmic improvement accounts for most of the speedup. The rest
of the speedup is from doing less work while collecting the initial
subtyping constraints in a parallel function analysis. The old
implementation used an instance of Unsubtyping to collect constraints in
each function, which would end up doing some analysis to find additional
constraints. The new implementation does not do any analysis of
transitively required constraints during the initial parallel function
analysis.

@tlively tlively requested a review from kripken July 18, 2025 00:51
Speed up Unsubtyping by over 2x via algorithmic and other improvements.

The most expensive part of the Unsubtyping analysis is the handling of
casts. For each cast source and destination pair, each type that remains
a subtype of the source and was originally a subtype of the destination
must remain a subtype of the destination for the cast to continue
succeeding.

Previously, Unsubtyping analyzed these cast relationships for all types
as a single unit of work whenever it reached a fixed point from
examining other sources of subtyping constraints. This led to duplicated
work because the subtype, cast source, and cast destination triples
analyzed once would be analyzed again the next time casts were
considered.

Avoid this duplicated cast analysis by incrementally analyzing casts
whenever a new subtyping is discovered. Maintain the invariant that each
new subtyping either joins a subtyping tree rooted at the discovered
subtype into the discovered supertype's tree, or reparents the subtype
below some (possibly indirect) subtype of its old parent. In the former
case, the subtype and all of its descendents are evaluated against all
casts originating from all their new supertypes in the tree they have
joined. In the latter case, they must already have been evaluated
against all casts originating at the old supertype and its ancestors, so
they need only to be evaluated against their new supertypes up to the
old supertype. Once a particular type is evaluated against casts
originating from a particular supertype, that type will never be
evaluated against those casts again.

This algorithmic improvement accounts for most of the speedup. The rest
of the speedup is from doing less work while collecting the initial
subtyping constraints in a parallel function analysis. The old
implementation used an instance of Unsubtyping to collect constraints in
each function, which would end up doing some analysis to find additional
constraints. The new implementation does not do any analysis of
transitively required constraints during the initial parallel function
analysis.
@tlively tlively force-pushed the unsubtyping-rewrite branch from 2ff8dd1 to 038bb62 Compare July 18, 2025 01:18
@kripken
Copy link
Member

kripken commented Jul 21, 2025

Nice speedup!

What is the absolute time this pass takes? I'm curious if it remains one of our slower passes or not.

Subtypes subtypes(HeapType type) { return {this, getNode(type)}; }

private:
Index getNode(HeapType type) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps getIndex?

auto subIndex = getNode(sub);
auto superIndex = getNode(super);
auto& childNode = nodes[subIndex];
auto& parentNode = nodes[superIndex];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps subNode, superNode?

(or use parent/child before instead of sub/super)

// Swap the indices in the parent's child vector.
std::swap(children[childNode.indexInParent], children.back());
// Swap the indices in the children.
std::swap(childNode.indexInParent, swappedNode.indexInParent);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::swap(childNode.indexInParent, swappedNode.indexInParent);
swappedNode.indexInParent = childNode.indexInParent;

This is overwritten below anyhow

Comment on lines 247 to 258
auto& [index, childIndex] = stack.back();
if (childIndex == parent->nodes[index].children.size()) {
stack.pop_back();
continue;
}
break;
}
auto& [index, childIndex] = stack.back();
auto child = parent->nodes[index].children[childIndex];
++childIndex;
stack.push_back({child, 0u});
return *this;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto& [index, childIndex] = stack.back();
if (childIndex == parent->nodes[index].children.size()) {
stack.pop_back();
continue;
}
break;
}
auto& [index, childIndex] = stack.back();
auto child = parent->nodes[index].children[childIndex];
++childIndex;
stack.push_back({child, 0u});
return *this;
auto& [index, childIndex] = stack.back();
auto& children = parent->nodes[index].children;
if (childIndex == children.size()) {
stack.pop_back();
} else {
auto child = children[childIndex];
++childIndex;
stack.push_back({child, 0u});
return *this;
}
}

Doing it all in the loop is shorter since it can reuse some things

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

void noteSubtype(HeapType sub, HeapType super) {
if (sub == super || sub.isBottom() || super.isBottom()) {
size_t noteCount = 0;
void note(HeapType sub, HeapType super) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous name noteSubtype helped remember the order of the parameters? (parallel to isSubtype)

: ControlFlowWalker<Collector, SubtypingDiscoverer<Collector>> {
Info& info;
Collector(Info& info) : info(info) {}
void noteSubtype(Type sub, Type super) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is generic - makes me wonder if maybe we should put it in the shared header as the default? I guess considering that could be separate from this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think there's room to simplify this, especially since Unsubtyping is back to being the only user of SubtypingDiscoverer. (StringLowering previously used it, but no longer needs to now that string is a subtype of extern.)

for (auto type : ModuleUtils::getPublicHeapTypes(wasm)) {
if (auto super = type.getDeclaredSuperType()) {
noteSubtype(type, *super);
size_t processCount = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
size_t processCount = 0;
size_t processCount = 0;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is actually left over from performance debugging.

continue;
void
processCasts(HeapType sub, HeapType super, std::optional<HeapType> oldSuper) {
// We are either attaching the one tree rooted at `type` under a new
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type is not one of the parameters - does it refer to sub or super perhaps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yes. I will update this to sub.

@tlively
Copy link
Member Author

tlively commented Jul 21, 2025

Nice speedup!

What is the absolute time this pass takes? I'm curious if it remains one of our slower passes or not.

On a calcworker binary that has been through a single -O3, unsubtyping takes 3.87 seconds before this change and 1.16 seconds after this change. On the unoptimized binary, we go from 18.2 seconds to 6.9 seconds.

So it's still on the slower side, but it's no longer one of the few slowest passes.

Comment on lines +147 to +148
auto childIndex = getIndex(sub);
auto parentIndex = getIndex(super);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto childIndex = getIndex(sub);
auto parentIndex = getIndex(super);
auto subIndex = getIndex(sub);
auto superIndex = getIndex(super);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is, I think it would be good to consistently use sub/super or parent/child, but not mix them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I've standardized on parent/child when naming indices and nodes in the tree. The parameters are still sub and super because they are types rather than tree elements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, sgtm.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comment

@tlively tlively merged commit b440e68 into main Jul 21, 2025
16 checks passed
@tlively tlively deleted the unsubtyping-rewrite branch July 21, 2025 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants