Skip to content

Conversation

@markerikson
Copy link
Collaborator

@markerikson markerikson commented Oct 15, 2025

Stacked on top of #1164 for assorted smaller perf tweaks, followed by #1184 for array method overrides:

This PR ports Mutative's "finalization callback" approach as a more targeted and performant implementation for finalization compared to the existing recursive tree traversal approach:

  • Added cleanup callbacks for each draft that's created
  • Added callbacks to handle root drafts, assigned values, and recursing
    inside of plain values
  • Updated state creation to return [draft, state] to avoid a lookup
  • Rewrote patch generation system to work with callbacks instead of
    during tree traversal

It also makes some additional tweaks to perf and internal logic:

  • Sets strictIteration: false (technically a breaking change) to get the benefits of faster loose iteration
  • Updates has / get / set to allow passing type? = getArchetype(value) as the last argument. We know this in most places we call them, so we can pass it in without calling getArchetype() repeatedly
  • Converts assigned_ from a Record to a Map to enable size lookups
  • Tweaks freeze() to remove an isDraftable check that wasn't necessary (and slightly expensive), and to use each(val, cb, false) for consistency with loose iteration

Performance

Stacked on top of the misc perf improvements from #1164 , and with strictIteration: false, this branch shows:

┌─────────────────────┬──────────────┬──────────────┬─────────────┐
│ Scenario            │ immer10      │ immer10Perf  │ Improvement │
├─────────────────────┼──────────────┼──────────────┼─────────────┤
│ remove-high-reuse   │        9.3ms │        2.8ms │      +70.0% │
│ mixed-sequence      │        3.3ms │        1.2ms │      +63.5% │
│ concat              │      132.4µs │       50.5µs │      +61.8% │
│ update-reuse        │        3.5ms │        1.7ms │      +52.0% │
│ update-high         │      105.5µs │       53.8µs │      +49.0% │
│ rtkq-sequence       │       14.8ms │        7.6ms │      +48.3% │
│ update-high-reuse   │        3.6ms │        2.0ms │      +44.6% │
│ updateLargeObject-r │       10.4ms │        6.2ms │      +40.2% │
│ update-multiple     │       49.2µs │       29.6µs │      +39.9% │
│ remove-reuse        │        4.6ms │        3.2ms │      +31.0% │
│ updateLargeObject   │      219.7µs │      158.3µs │      +28.0% │
│ update              │       24.8µs │       18.6µs │      +24.9% │
│ add                 │       26.2µs │       22.2µs │      +15.0% │
│ remove-high         │      118.3µs │      107.0µs │       +9.6% │
│ sortById-reverse    │      149.1µs │      144.0µs │       +3.4% │
│ remove              │      123.4µs │      144.3µs │      -16.9% │
│ reverse-array       │      110.6µs │      140.5µs │      -27.0% │
│ filter              │       50.1µs │       65.5µs │      -30.7% │
│ mapNested           │      129.6µs │      184.9µs │      -42.7% │
└─────────────────────┴──────────────┴──────────────┴─────────────┘

✓ immer10Perf shows an average 24.4% performance improvement over immer10

Since #1164 was ~20% faster than v10, this is another ~4.5% faster.

I'll note that's with some of the array scenarios actually showing regressions, so the other scenarios appear to be a bigger improvement than 4-5%. I don't have a full explanation for why those scenarios got a bit slower. Some of it may be doing additional work when creating proxies - I'd still like to dig into it a bit further. However, the array scenarios are all in the microsecond range already, so not like it's a huge difference. Additionally, the next architectural PR that adds overriding the array methods results in drastic improvements for all of those.

Bundle Size

Eyeballing bundle sizes, this PR increases the immer.production.js minified bundle size by about 1700 bytes, from ~12K to ~14K. If I build a Vite app and measure using Sonda, actual size in a built app appears to have grown from:

  • v10.0.3 (which is out of date but what my branches were based from): 7.52K
  • The original set of perf tweaks: 8.11K (which may include some of the other recent PRs that were merged - not sure my changes would have added that much)
  • This PR with callbacks: 9.38K

I'm always very sensitive to increasing bundle size, so there's a very real question of whether adding a couple K here is worth it for a net 5-7% perf increase, especially when the larger benefits seem to be from the smaller tweaks and the array method overrides.

@markerikson markerikson force-pushed the feature/finalization-callbacks branch from 422ba2b to 53e71db Compare October 15, 2025 01:42
@coveralls
Copy link

coveralls commented Oct 15, 2025

Pull Request Test Coverage Report for Build 18857343219

Details

  • 434 of 448 (96.88%) changed or added relevant lines in 10 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+2.6%) to 44.999%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/core/finalize.ts 146 148 98.65%
src/plugins/patches.ts 85 97 87.63%
Files with Coverage Reduction New Missed Lines %
src/core/finalize.ts 2 96.75%
Totals Coverage Status
Change from base Build 18857121209: 2.6%
Covered Lines: 1468
Relevant Lines: 3973

💛 - Coveralls

Copy link
Collaborator

@mweststrate mweststrate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is absolutely marvelous work! Thanks you so much! I left a bunch of questions and potential further optimizable hotspots, but all are pretty small I think


// This actually seems to pass now!
it("cannot return an object that references itself", () => {
it.skip("cannot return an object that references itself", () => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this skip intended? if so, let's leave a comment why :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .skip was because this test no longer threw an error.

I did some double-checking and the behavior is now much more nuanced:

  • Just doing () => res.self doesn't error.
  • self-references can show up in the final output...
  • but as soon as you modify a value that is circular, that reference gets copied and isn't itself circular any more (ie its circular field now points to the original reference that is circular)

I rewrote the test to reflect that behavior

(state.assigned_ && state.assigned_.size > 0))

if (shouldFinalize) {
if (patches && inversePatches) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: I think we actually only need to check one, and the other just got there for the typechecker's happiness, but isn't actually needed at runtime, so inversePatches! below might do the trick

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote the patch handling to pass around the scope instead, so that we only destructure {patches, inversePatches} inside of the patch logic.

// For sets we clone before iterating, otherwise we can get in endless loop due to modifying during iteration, see #628
// To preserve insertion order in all cases we then clear the set
if (target.type_ === ArchType.Set && target.copy_) {
const copy = new Set(target.copy_)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT, might be cheaper to do
const copy = target.copy_; target.copy_ = new Set(); copy.forEach...?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think there's a meaningful difference here. However, I did move this into the patches plugin, so that we save the byte size until we actually expect to see Sets.

@mweststrate
Copy link
Collaborator

I propose this as major version, although all checks pass and things look correct to me, any fall out from subtly changed semantics is easier to handle if it is a new major, and I really like the new iteration default as well :)

@markerikson
Copy link
Collaborator Author

Awesome! And yeah, I agree - sometimes changes this big are worth releasing a new major anyway even if it's internal only and still compatible :)

@markerikson
Copy link
Collaborator Author

Also: I'm still seeing huge increases in perf from changing the baseline shallowCopy behavior from {...base} to Object.keys(base).forEach() in real world examples. Oddly they weren't as visible in the benchmarks. That's still on my list to investigate and better understand why.

Ported Mutative's "finalization callback" approach as a more targeted and performant implementation for finalization compared to the existing recursive tree traversal approach:

- Added cleanup callbacks for each draft that's created
- Added callbacks to handle root drafts, assigned values, and recursing
  inside of plain values
- Updated state creation to return [draft, state] to avoid a lookup
- Rewrote patch generation system to work with callbacks instead of
  during tree traversal
@markerikson markerikson force-pushed the feature/finalization-callbacks branch from 53e71db to cfae1ac Compare October 27, 2025 22:09
@markerikson markerikson changed the base branch from feature/optimize-immer-perf to main October 27, 2025 22:09
@markerikson
Copy link
Collaborator Author

Update, 2025-10-27

I rebased the PR on top of the latest main as of #1187 .

I've gone through and applied most of the PR feedback from @mweststrate . I didn't include pre-sizing arrays - that broke the callback loop logic with both for loops and callbacks.pop().

Further Changes

I did a significant amount of byte-shaving work:

  • I changed all of the patch-handling logic to pass the entire ImmerScope down, instead of patches, inversePatches. This means we only have to destructure those once, inside of the actual patch generation functions.
  • ImmerScope now pre-caches the patchPlugin and mapSetPlugin references on scope creation. This saves us the plugin lookups everywhere except createProxy(), which relies on calling getPlugin("MapSet") inline in order to throw an error if it sees a Map or Set without the plugin being enabled.
  • Inlined the finalizeAssigned method that was added earlier in the PR, as it was only used in one place
  • Moved fixPotentialSetContents into the MapSet plugin and renamed to fixSetContents. This saves the size of that method.
  • Extracted utils like isObjectish, isFunction, and isArray and replaced all of the inline typeof x === "object" checks, saving the sizes of typeof and the constants
  • Saved a shorthand const O = Object in common.ts, saving a few bytes off of Object.whatever usages
  • Created constants for several commonly used field names ("constructor", "prototype", "value", "writable"etc), as well as the"MapSet"and"Patches"` plugin names, and used those wherever possible.
  • Converted a bunch of simple function x(arg) { return y } functions into arrow functions with implicit returns ( let x = (arg) => y )

I set up a Vite + React example app and did builds with the existing v10 version of Immer, this branch before I did any byte-shaving, and this branch with all the byte-shaving:

v10.1.3 used bundle size: 7.52K image
v10.2.0 (perf tweaks) used bundle size: 8.09K image
v10Perf + callbacks before shaving bundle size: 9.36K image
v10Perf + callbacks after shaving bundle size: 8.77K image

v10.2.0 presumably got bigger from the addition of useSetStrictIteration and the cache checks.

So, net result, this PR is ~600 bytes bigger than v10.2.0 after adding all the rewritten finalization logic and subtracting all the byte shaving.

Performance

After #1164 , we were already +20% faster than v10.1.3 once you turn on loose iteration (per latest numbers in #1187):

✓ immer10Perf shows an average 20.7% performance improvement over immer10

┌─────────────────────┬──────────────┬──────────────┬─────────────┐
│ Scenario            │ immer10      │ immer10Perf  │ Improvement │
├─────────────────────┼──────────────┼──────────────┼─────────────┤
│ update-reuse        │        9.3ms │        4.4ms │      +52.4% │
│ mixed-sequence      │        8.7ms │        4.7ms │      +46.7% │
│ update-high-reuse   │       11.3ms │        6.1ms │      +45.8% │
│ remove-reuse        │       12.2ms │        6.8ms │      +44.4% │
│ remove-high-reuse   │        8.6ms │        5.3ms │      +38.6% │
│ update-largeObject1 │       15.9ms │       10.6ms │      +33.3% │
│ rtkq-sequence       │       31.9ms │       22.4ms │      +29.9% │
│ concat              │      157.9µs │      115.8µs │      +26.6% │
│ update-largeObject2 │       30.1ms │       22.1ms │      +26.4% │
│ add                 │       83.2µs │       65.5µs │      +21.4% │
│ update-largeObject1 │      572.2µs │      462.1µs │      +19.2% │
│ update-largeObject2 │        2.0ms │        1.7ms │      +16.5% │
│ filter              │      107.5µs │       90.3µs │      +16.0% │
│ update              │       70.5µs │       59.3µs │      +15.8% │
│ update-multiple     │       85.0µs │       72.0µs │      +15.3% │
│ update-high         │      131.9µs │      131.6µs │       +0.3% │
│ sortById-reverse    │      188.6µs │      189.0µs │       -0.2% │
│ mapNested           │      171.6µs │      173.9µs │       -1.4% │
│ reverse-array       │      170.1µs │      176.0µs │       -3.5% │
│ remove              │      173.0µs │      179.1µs │       -3.5% │
│ remove-high         │      148.0µs │      154.9µs │       -4.6% │
└─────────────────────┴──────────────┴──────────────┴─────────────┘

This PR makes that strict iteration the default.

From there, the performance in this PR initially looks like a no-op vs those numbers, still around +20% vs v10.1.3:

✓ immer10Perf shows an average 20.9% performance improvement over immer10

┌─────────────────────┬──────────────┬──────────────┬─────────────┐
│ Scenario            │ immer10      │ immer10Perf  │ Improvement │
├─────────────────────┼──────────────┼──────────────┼─────────────┤
│ update-reuse        │        8.6ms │        3.5ms │      +59.3% │
│ mixed-sequence      │        8.1ms │        3.3ms │      +59.0% │
│ update-high-reuse   │        8.6ms │        3.9ms │      +54.9% │
│ remove-high-reuse   │        8.9ms │        4.4ms │      +50.3% │
│ remove-reuse        │       10.5ms │        6.2ms │      +41.0% │
│ update-largeObject1 │       14.3ms │        8.7ms │      +39.6% │
│ concat              │      164.4µs │      105.1µs │      +36.1% │
│ update-high         │      135.5µs │       89.2µs │      +34.2% │
│ rtkq-sequence       │       28.2ms │       20.1ms │      +28.7% │
│ update-largeObject2 │       27.9ms │       20.1ms │      +27.8% │
│ update-multiple     │       90.4µs │       67.7µs │      +25.0% │
│ update-largeObject1 │      592.5µs │      490.5µs │      +17.2% │
│ remove-high         │      154.4µs │      131.1µs │      +15.1% │
│ update-largeObject2 │        2.0ms │        1.7ms │      +14.3% │
│ update              │       70.6µs │       65.4µs │       +7.5% │
│ filter              │      110.5µs │      103.5µs │       +6.3% │
│ add                 │       67.9µs │       67.8µs │       +0.1% │
│ mapNested           │      175.2µs │      190.8µs │       -8.9% │
│ sortById-reverse    │      163.1µs │      178.7µs │       -9.5% │
│ reverse-array       │      165.0µs │      213.1µs │      -29.2% │
│ remove              │      182.1µs │      237.2µs │      -30.3% │
└─────────────────────┴──────────────┴──────────────┴─────────────┘

But, it's worth noting that's dealing with some of the array scenarios apparently getting somewhat slower. (I don't know why those are slower yet, other than just more logic running in general.) I think it's reasonable to assume that reverse() isn't going to be done often, and that for usages like RTK most updates will be made against previously-frozen state.

If I remove a couple of the array scenarios, the results look like:

✓ immer10Perf shows an average 26.5% performance improvement over immer10

┌─────────────────────┬──────────────┬──────────────┬─────────────┐
│ Scenario            │ immer10      │ immer10Perf  │ Improvement │
├─────────────────────┼──────────────┼──────────────┼─────────────┤
│ update-reuse        │        8.1ms │        3.4ms │      +58.2% │
│ update-high-reuse   │        8.7ms │        3.8ms │      +56.3% │
│ mixed-sequence      │        8.2ms │        3.6ms │      +55.3% │
│ remove-high-reuse   │        8.6ms │        4.3ms │      +49.8% │
│ update-largeObject1 │       14.2ms │        8.5ms │      +40.0% │
│ remove-reuse        │        9.0ms │        5.6ms │      +38.1% │
│ concat              │      168.0µs │      111.2µs │      +33.8% │
│ update-largeObject1 │      279.8µs │      192.7µs │      +31.1% │
│ update-largeObject2 │       33.8ms │       23.8ms │      +29.7% │
│ rtkq-sequence       │       27.5ms │       20.7ms │      +24.9% │
│ update-high         │      139.9µs │      105.7µs │      +24.4% │
│ update              │       75.8µs │       64.1µs │      +15.5% │
│ add                 │       76.8µs │       69.4µs │       +9.7% │
│ update-largeObject2 │        2.1ms │        1.9ms │       +8.7% │
│ update-multiple     │       95.2µs │       87.3µs │       +8.4% │
│ remove-high         │      151.5µs │      144.2µs │       +4.8% │
│ filter              │       99.6µs │       99.5µs │       +0.0% │
│ remove              │      186.8µs │      209.5µs │      -12.2% │
└─────────────────────┴──────────────┴──────────────┴─────────────┘

I realize that's sort of cherry-picking results ("it's faster if we ignore the parts where it got slower!"), but also hopefully more reflective of real-world usage.

On top of that, this doesn't include any of the array overrides from #1184 , which make all of the array ops drastically faster.

Finally, the changes in this PR also significantly cut down on the amount of time spent recursing through object trees. There's still freeze(deep) and handleValue(), but based on the benchmarks and perf profiling those don't come into play nearly as much as finalize/finalizeProperties() did.

Given all that, I do think this PR is a net improvement and worth merging.

@mweststrate
Copy link
Collaborator

Looks great and I'm happy to merge this! The following would work, correct?

  • Land this as Immer 10.0.0
  • Land the array ops as 10.1.0 (no breaking changes, but significant improvement)
  • Land the strict mode draft as 10.1.1 (no breaking changes? didn't study yet)

@markerikson
Copy link
Collaborator Author

You lost me on those numbers, since we're on 10.2.0 right now :)

I did turn on loose iteration in this PR, which would be the main "breaking" change. In theory the rest of these changes are all internal, but there's enough churn that my own gut decision would be to ship this as 11.0 on principle. The array ops in #1184 ought to be shippable in a 10.x minor since they're opt-in.

I think I see one or two stray comments in this PR I want to clean up - I can get to that tonight.

@markerikson
Copy link
Collaborator Author

One note about this PR's behavior: per the changed tests, it does seem to alter the behavior around circular references so that Immer can handle them without throwing, which is an improvement.

@mweststrate mweststrate merged commit d6c1202 into main Nov 23, 2025
1 check passed
@github-actions
Copy link
Contributor

🎉 This PR is included in version 11.0.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@mweststrate
Copy link
Collaborator

11.x, sorry :) Merged as breaking change, release should go out soon!

it does seem to alter the behavior around circular references so that Immer can handle them without throwing, which is an improvement

Yeah this is something I don't want to advertise, fundamentally the mental model breaks when having circular refs between nodes, making the data structure no longer a tree. It is unclear what it would mean for example to mutate the "same" object twice through two different paths. Or even with only one modfication, what it would mean for patches, pointer/structural equality downstream, etc etc. So I'd argue this remains "unsupported" with "undefined behavior" that might change arbitrarily between releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants