Skip to content

Conversation

@moberegger
Copy link

@moberegger moberegger commented Jun 16, 2025

Similar to #14, but for the set! DSL. Many of the changes are similar:

  • Save on a call to one?. This shows up as a hotspot for us, and it isn't actually needed. The intent here was to check if a single argument was provided to set! to determine whether or not a partial should be rendered. It's actually just faster to check if the first argument is a Hash than it is to call out to one? first, because the latter is an O(n) operation.
  • Saves on memory allocations originally caused by an internal call to Jbuilder::set! via super, which splats *args into an allocated Array each time. The PR introduces _set to be used internally, which saves on this allocation. This is similar to what was done in Optimize internal extract! calls to save on memory allocation #7 for extract!.

One difference with #14 that is worth noting are how blocks work. Calls to ::Kernel.block_given? do show up as a hotspot in some of our profiles, which can be optimized away with a simpler if block style check. This makes sense given that methods were already receiving a &block parameter. While iteratively benchmarking this branch, I learned that simply having a &block parameter incurs overhead, regardless of whether or not a block is provided:

  • If no block is provided, simply having &block in the method introduced extra latency even though block would evaluate to nil. I'm unsure of the reason why, but I presume Ruby has to do some extra work here to determine what the value of &block should be.
  • If a block was provided, &block would also result in extra memory allocation when converting the block to a Proc. While the number of allocations would be the same, the Proc resulted in an increase in memsize

Ultimately it was cheaper to keep &block out of the method signatures and determine behaviour via ::Kernel.block_given?. I can revisit the work done in #14 and do the same.

I will annotate some other points of interest below.

Benchmarks for the affected DSLs are below. We see an improvement in all of them!!! (I will commit these with the PR soon).


json.set! :foo, :bar
ruby 3.4.4 (2025-05-14 revision a38531fd3f) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
              before   442.412k i/100ms
               after   715.531k i/100ms
Calculating -------------------------------------
              before      5.169M (± 2.3%) i/s  (193.48 ns/i) -     26.102M in   5.053092s
               after      8.678M (± 1.9%) i/s  (115.23 ns/i) -     43.647M in   5.031269s

Comparison:
               after:  8678428.8 i/s
              before:  5168538.2 i/s - 1.68x  slower
Calculating -------------------------------------
              before    80.000  memsize (     0.000  retained)
                         2.000  objects (     0.000  retained)
                         0.000  strings (     0.000  retained)
               after    40.000  memsize (     0.000  retained)
                         1.000  objects (     0.000  retained)
                         0.000  strings (     0.000  retained)

Comparison:
               after:         40 allocated
              before:         80 allocated - 2.00x more

# Where object = { bar: 123 }
json.set! :foo do
  json.extract! object, :bar
end
ruby 3.4.4 (2025-05-14 revision a38531fd3f) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
              before   150.176k i/100ms
               after   179.919k i/100ms
Calculating -------------------------------------
              before      1.553M (± 2.5%) i/s  (643.85 ns/i) -      7.809M in   5.031098s
               after      1.904M (± 1.5%) i/s  (525.19 ns/i) -      9.536M in   5.009223s

Comparison:
               after:  1904088.4 i/s
              before:  1553160.1 i/s - 1.23x  slower
Calculating -------------------------------------
              before   280.000  memsize (   160.000  retained)
                         4.000  objects (     1.000  retained)
                         0.000  strings (     0.000  retained)
               after   240.000  memsize (   160.000  retained)
                         3.000  objects (     1.000  retained)
                         0.000  strings (     0.000  retained)

Comparison:
               after:        240 allocated
              before:        280 allocated - 1.17x more

# Where array = [1, 2, 3]
json.set! :foo, array do |item|
  json.set! :bar, item
end
ruby 3.4.4 (2025-05-14 revision a38531fd3f) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
              before    76.501k i/100ms
               after   104.594k i/100ms
Calculating -------------------------------------
              before    823.493k (± 2.2%) i/s    (1.21 μs/i) -      4.131M in   5.019101s
               after      1.085M (± 4.0%) i/s  (921.32 ns/i) -      5.439M in   5.020763s

Comparison:
               after:  1085397.8 i/s
              before:   823492.8 i/s - 1.32x  slower
Calculating -------------------------------------
              before   920.000  memsize (   520.000  retained)
                        14.000  objects (     4.000  retained)
                         0.000  strings (     0.000  retained)
               after   760.000  memsize (   520.000  retained)
                        10.000  objects (     4.000  retained)
                         0.000  strings (     0.000  retained)

Comparison:
               after:        760 allocated
              before:        920 allocated - 1.21x more

json.set! :post, post, partial: "post", as: :post
ruby 3.4.4 (2025-05-14 revision a38531fd3f) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
              before   111.485k i/100ms
               after   125.542k i/100ms
Calculating -------------------------------------
              before      1.138M (± 6.6%) i/s  (879.02 ns/i) -      5.686M in   5.027195s
               after      1.354M (± 1.2%) i/s  (738.83 ns/i) -      6.779M in   5.009416s

Comparison:
               after:  1353500.4 i/s
              before:  1137626.0 i/s - 1.19x  slower
Calculating -------------------------------------
              before   837.923k memsize (    40.633k retained)
                         3.495k objects (   412.000  retained)
                        50.000  strings (    50.000  retained)
               after   837.923k memsize (    40.633k retained)
                         3.495k objects (   412.000  retained)
                        50.000  strings (    50.000  retained)

Comparison:
              before:     837923 allocated
               after:     837923 allocated - same

@moberegger moberegger force-pushed the moberegger/affinity/optimize-set-dsl branch from ec328ef to 5f5c7cb Compare June 16, 2025 20:33
@moberegger moberegger force-pushed the moberegger/affinity/optimize-set-dsl branch from b6f2b3e to e2a3866 Compare June 17, 2025 01:34
Comment on lines +244 to +252
if _blank?(value)
# json.comments { ... }
# { "comments": ... }
_merge_block(key) { yield self }
else
# json.comments @post.comments { |comment| ... }
# { "comments": [ { ... }, { ... } ] }
_scope { _array(value) { |element| yield element } }
end
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I swapped the if condition here. It originally used to be

      if !_blank?(value)
	# json.comments @post.comments { |comment| ... }
        # { "comments": [ { ... }, { ... } ] }
        _scope { _array(value) { |element| yield element } }
      else
        # json.comments { ... }
        # { "comments": ... }
        _merge_block(key) { yield self }
      end

but it is slightly faster to not negate the original check.

Comment on lines -242 to +245
locals = ::Hash[options[:as], object]
_scope do
options[:locals] = locals
options[:locals] = { options[:as] => object }
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a slightly faster way to assign the hash.

Comment on lines +137 to 141
elsif ::Kernel.block_given?
_set(name, object, args) { |x| yield x }
else
super
_set(name, object, args)
end
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks strange, but is the only way I found to avoid having a &block parameter.

An alternative was

    elsif ::Kernel.block_given?
      super
    else
      _set(name, object, args)
    end

but this would incur an extra memory allocation for *args via the super call whenever a block was provided to set!.

@moberegger moberegger marked this pull request as ready for review June 17, 2025 15:49
Copy link

@mscrivo mscrivo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@moberegger moberegger merged commit 8b047bd into main Jun 17, 2025
30 checks passed
This was referenced Jun 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants