[NFC] Speed up Unsubtyping #7734

tlively · 2025-07-18T00:51:49Z

Speed up Unsubtyping by over 2x via algorithmic and other improvements.

The most expensive part of the Unsubtyping analysis is the handling of
casts. For each cast source and destination pair, each type that remains
a subtype of the source and was originally a subtype of the destination
must remain a subtype of the destination for the cast to continue
succeeding.

Previously, Unsubtyping analyzed these cast relationships for all types
as a single unit of work whenever it reached a fixed point from
examining other sources of subtyping constraints. This led to duplicated
work because the subtype, cast source, and cast destination triples
analyzed once would be analyzed again the next time casts were
considered.

Avoid this duplicated cast analysis by incrementally analyzing casts
whenever a new subtyping is discovered. Maintain the invariant that each
new subtyping either joins a subtyping tree rooted at the discovered
subtype into the discovered supertype's tree, or reparents the subtype
below some (possibly indirect) subtype of its old parent. In the former
case, the subtype and all of its descendents are evaluated against all
casts originating from all their new supertypes in the tree they have
joined. In the latter case, they must already have been evaluated
against all casts originating at the old supertype and its ancestors, so
they need only to be evaluated against their new supertypes up to the
old supertype. Once a particular type is evaluated against casts
originating from a particular supertype, that type will never be
evaluated against those casts again.

This algorithmic improvement accounts for most of the speedup. The rest
of the speedup is from doing less work while collecting the initial
subtyping constraints in a parallel function analysis. The old
implementation used an instance of Unsubtyping to collect constraints in
each function, which would end up doing some analysis to find additional
constraints. The new implementation does not do any analysis of
transitively required constraints during the initial parallel function
analysis.

Speed up Unsubtyping by over 2x via algorithmic and other improvements. The most expensive part of the Unsubtyping analysis is the handling of casts. For each cast source and destination pair, each type that remains a subtype of the source and was originally a subtype of the destination must remain a subtype of the destination for the cast to continue succeeding. Previously, Unsubtyping analyzed these cast relationships for all types as a single unit of work whenever it reached a fixed point from examining other sources of subtyping constraints. This led to duplicated work because the subtype, cast source, and cast destination triples analyzed once would be analyzed again the next time casts were considered. Avoid this duplicated cast analysis by incrementally analyzing casts whenever a new subtyping is discovered. Maintain the invariant that each new subtyping either joins a subtyping tree rooted at the discovered subtype into the discovered supertype's tree, or reparents the subtype below some (possibly indirect) subtype of its old parent. In the former case, the subtype and all of its descendents are evaluated against all casts originating from all their new supertypes in the tree they have joined. In the latter case, they must already have been evaluated against all casts originating at the old supertype and its ancestors, so they need only to be evaluated against their new supertypes up to the old supertype. Once a particular type is evaluated against casts originating from a particular supertype, that type will never be evaluated against those casts again. This algorithmic improvement accounts for most of the speedup. The rest of the speedup is from doing less work while collecting the initial subtyping constraints in a parallel function analysis. The old implementation used an instance of Unsubtyping to collect constraints in each function, which would end up doing some analysis to find additional constraints. The new implementation does not do any analysis of transitively required constraints during the initial parallel function analysis.

kripken · 2025-07-21T15:53:43Z

Nice speedup!

What is the absolute time this pass takes? I'm curious if it remains one of our slower passes or not.

kripken · 2025-07-21T16:42:09Z

src/passes/Unsubtyping.cpp

+  Subtypes subtypes(HeapType type) { return {this, getNode(type)}; }
+
+private:
+  Index getNode(HeapType type) {


Perhaps getIndex?

kripken · 2025-07-21T16:42:55Z

src/passes/Unsubtyping.cpp

+    auto subIndex = getNode(sub);
+    auto superIndex = getNode(super);
+    auto& childNode = nodes[subIndex];
+    auto& parentNode = nodes[superIndex];


Perhaps subNode, superNode?

(or use parent/child before instead of sub/super)

kripken · 2025-07-21T16:45:08Z

src/passes/Unsubtyping.cpp

+      // Swap the indices in the parent's child vector.
+      std::swap(children[childNode.indexInParent], children.back());
+      // Swap the indices in the children.
+      std::swap(childNode.indexInParent, swappedNode.indexInParent);


Suggested change

std::swap(childNode.indexInParent, swappedNode.indexInParent);

swappedNode.indexInParent = childNode.indexInParent;

This is overwritten below anyhow

kripken · 2025-07-21T16:51:22Z

src/passes/Unsubtyping.cpp

+        auto& [index, childIndex] = stack.back();
+        if (childIndex == parent->nodes[index].children.size()) {
+          stack.pop_back();
+          continue;
+        }
+        break;
+      }
+      auto& [index, childIndex] = stack.back();
+      auto child = parent->nodes[index].children[childIndex];
+      ++childIndex;
+      stack.push_back({child, 0u});
+      return *this;


Suggested change

auto& [index, childIndex] = stack.back();

if (childIndex == parent->nodes[index].children.size()) {

stack.pop_back();

continue;

}

break;

}

auto& [index, childIndex] = stack.back();

auto child = parent->nodes[index].children[childIndex];

++childIndex;

stack.push_back({child, 0u});

return *this;

auto& [index, childIndex] = stack.back();

auto& children = parent->nodes[index].children;

if (childIndex == children.size()) {

stack.pop_back();

} else {

auto child = children[childIndex];

++childIndex;

stack.push_back({child, 0u});

return *this;

}

}

Doing it all in the loop is shorter since it can reuse some things

kripken · 2025-07-21T16:55:35Z

src/passes/Unsubtyping.cpp

-  void noteSubtype(HeapType sub, HeapType super) {
-    if (sub == super || sub.isBottom() || super.isBottom()) {
+  size_t noteCount = 0;
+  void note(HeapType sub, HeapType super) {


Previous name noteSubtype helped remember the order of the parameters? (parallel to isSubtype)

kripken · 2025-07-21T16:59:00Z

src/passes/Unsubtyping.cpp

+      : ControlFlowWalker<Collector, SubtypingDiscoverer<Collector>> {
+      Info& info;
+      Collector(Info& info) : info(info) {}
+      void noteSubtype(Type sub, Type super) {


This one is generic - makes me wonder if maybe we should put it in the shared header as the default? I guess considering that could be separate from this PR.

Yeah, I think there's room to simplify this, especially since Unsubtyping is back to being the only user of SubtypingDiscoverer. (StringLowering previously used it, but no longer needs to now that string is a subtype of extern.)

kripken · 2025-07-21T17:00:21Z

src/passes/Unsubtyping.cpp

-    for (auto type : ModuleUtils::getPublicHeapTypes(wasm)) {
-      if (auto super = type.getDeclaredSuperType()) {
-        noteSubtype(type, *super);
+  size_t processCount = 0;


Suggested change

size_t processCount = 0;

size_t processCount = 0;

Ah, this is actually left over from performance debugging.

src/passes/Unsubtyping.cpp

kripken · 2025-07-21T19:54:23Z

src/passes/Unsubtyping.cpp

-          continue;
+  void
+  processCasts(HeapType sub, HeapType super, std::optional<HeapType> oldSuper) {
+    // We are either attaching the one tree rooted at `type` under a new


type is not one of the parameters - does it refer to sub or super perhaps?

Oops, yes. I will update this to sub.

tlively · 2025-07-21T21:50:33Z

Nice speedup!

What is the absolute time this pass takes? I'm curious if it remains one of our slower passes or not.

On a calcworker binary that has been through a single -O3, unsubtyping takes 3.87 seconds before this change and 1.16 seconds after this change. On the unoptimized binary, we go from 18.2 seconds to 6.9 seconds.

So it's still on the slower side, but it's no longer one of the few slowest passes.

kripken · 2025-07-21T23:14:49Z

src/passes/Unsubtyping.cpp

+    auto childIndex = getIndex(sub);
+    auto parentIndex = getIndex(super);


Suggested change

auto childIndex = getIndex(sub);

auto parentIndex = getIndex(super);

auto subIndex = getIndex(sub);

auto superIndex = getIndex(super);

That is, I think it would be good to consistently use sub/super or parent/child, but not mix them?

Right, I've standardized on parent/child when naming indices and nodes in the tree. The parameters are still sub and super because they are types rather than tree elements.

I see, sgtm.

kripken

lgtm % comment

tlively requested a review from kripken July 18, 2025 00:51

tlively force-pushed the unsubtyping-rewrite branch from 2ff8dd1 to 038bb62 Compare July 18, 2025 01:18

tlively added 2 commits July 17, 2025 22:09

fix setSupertype

ba3869a

fix indexInParent management

4e68417

kripken reviewed Jul 21, 2025

View reviewed changes

tlively added 2 commits July 21, 2025 14:25

address comments

89d72a5

Merge branch 'main' into unsubtyping-rewrite

3a9874b

kripken reviewed Jul 21, 2025

View reviewed changes

kripken approved these changes Jul 21, 2025

View reviewed changes

tlively merged commit b440e68 into main Jul 21, 2025
16 checks passed

tlively deleted the unsubtyping-rewrite branch July 21, 2025 23:46

	std::swap(childNode.indexInParent, swappedNode.indexInParent);
	swappedNode.indexInParent = childNode.indexInParent;

		auto childIndex = getIndex(sub);
		auto parentIndex = getIndex(super);

[NFC] Speed up Unsubtyping #7734

[NFC] Speed up Unsubtyping #7734

Uh oh!

Conversation

tlively commented Jul 18, 2025

Uh oh!

kripken commented Jul 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlively commented Jul 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kripken left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!