Merge master, inline retval #655

chriselrod · 2023-07-06T04:42:20Z

Adding the @inlines

julia> @benchmark ForEach(expm!, $B, $As)()
BenchmarkTools.Trial: 8692 samples with 1 evaluation.
 Range (min … max):   88.607 μs …   1.523 ms  ┊ GC (min … max):  0.00% … 87.83%
 Time  (median):      95.456 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   110.611 μs ± 120.857 μs  ┊ GC (mean ± σ):  11.04% ±  9.27%

  █      ▁                                                      ▁
  █▇▄▁▄▄▅█▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▄▆ █
  88.6 μs       Histogram: log(frequency) by time       1.12 ms <

 Memory estimate: 545.41 KiB, allocs estimate: 119.

Prior to that, performance was around 215 microseconds.

This reverts commit 43ef860, reversing changes made to 01a056d.

chriselrod · 2023-07-30T22:14:06Z

@KristofferC ?

chriselrod · 2023-08-14T17:42:22Z

Running some internal proprietary benchmark, the latest ForwardDiff release gives me

 18.568555 seconds (24.68 M allocations: 1.288 GiB, 2.00% gc time, 2.42% compilation time)
  7.463995 seconds (24.35 M allocations: 1.268 GiB, 2.79% gc time, 0.14% compilation time)
 18.064215 seconds (24.40 M allocations: 1.270 GiB, 1.72% gc time)
  7.450395 seconds (24.35 M allocations: 1.268 GiB, 2.67% gc time)

versus this PR

 13.992152 seconds (24.72 M allocations: 1.295 GiB, 2.51% gc time, 3.18% compilation time)
  6.132790 seconds (24.45 M allocations: 1.277 GiB, 4.11% gc time, 0.18% compilation time)
 13.443605 seconds (24.44 M allocations: 1.277 GiB, 2.11% gc time)
  6.045382 seconds (24.45 M allocations: 1.277 GiB, 3.19% gc time)

The difference is partially because my computer has AVX512.

If I start Julia with -C'native,-prefer-256-bit' to allow the autovectorizer to use 512 bit vectors, the latest ForwardDiff release gives me

 15.560895 seconds (24.68 M allocations: 1.288 GiB, 2.31% gc time, 2.70% compilation time)
  6.524974 seconds (24.35 M allocations: 1.268 GiB, 3.69% gc time, 0.17% compilation time)
 15.263915 seconds (24.40 M allocations: 1.270 GiB, 2.00% gc time)
  6.641746 seconds (24.35 M allocations: 1.268 GiB, 3.04% gc time)

while this PR is unchanged (as it isn't relying on the autovectorizer).

Compile times appear unchanged.
This PR is an easy free performance win.

I am not sure if retval not being inlined caused the problem we observed before the initial reverting, but I would like to see this move forward.

Or I'm just going to give up and create my own ForwardDiff, since duplicated effort tends to be smaller than collaboration's cost.

KristofferC · 2023-08-14T17:47:27Z

I missed this. Wouldn't the "correct" way to do this be to rebase the original branch on master, force push that commit and then add this above. As it is right now, it is kind of unreviewable.

KristofferC · 2023-08-14T17:51:23Z

Or I'm just going to give up and create my own ForwardDiff,

Hehe, yeah, replicating ForwardDiff is pretty fast. I tried something with Hessians using SIMD.jl in https://github.com/KristofferC/HyperHessians.jl with some decent perf results.

chriselrod · 2023-08-14T17:51:47Z

src/dual.jl

+@inline function dual_definition_retval(::Val{T}, val::Real, deriv::Real, partial::Partials) where {T}
+    return Dual{T}(val, deriv * partial)
+end
+@inline function dual_definition_retval(::Val{T}, val::Real, deriv1::Real, partial1::Partials, deriv2::Real, partial2::Partials) where {T}
+    return Dual{T}(val, _mul_partials(partial1, partial2, deriv1, deriv2))
+end
+@inline function dual_definition_retval(::Val{T}, val::Complex, deriv::Union{Real,Complex}, partial::Partials) where {T}
+    reval, imval = reim(val)
+    if deriv isa Real
+        p = deriv * partial
+        return Complex(Dual{T}(reval, p), Dual{T}(imval, zero(p)))
+    else
+        rederiv, imderiv = reim(deriv)
+        return Complex(Dual{T}(reval, rederiv * partial), Dual{T}(imval, imderiv * partial))
+    end
+end
+@inline function dual_definition_retval(::Val{T}, val::Complex, deriv1::Union{Real,Complex}, partial1::Partials, deriv2::Union{Real,Complex}, partial2::Partials) where {T}


@KristofferC adding these 4 @inlines to dual_definition_retval should be the only actual change.
I can't rebase your original PR. Mind doing so?

Perhaps it'd be better to rebase on the 0.10 branch than on master (although you could do master if you'd prefer having that be a breaking release for safety reasons).

You're then free to add these 4 @inlines directly, or seeing if this diff looks cleaner afterwards.

I updated the kc/simd branch, please check it out.

chriselrod · 2023-08-14T17:54:48Z

using SIMD.jl in https://github.com/KristofferC/HyperHessians.jl with some decent perf results.

Yeah, it might also be worth look at some of the existing alternative options before actually doing so, since some may be good starting points, especially as Pumas wants 2nd+ derivatives.

I think there was another package that was supposed to safe flops by being sparser (i.e., not calculating the same partials multiple times).

The worry is always the potentially long tail of polish/corner cases.
There have been a lot of long discussions over NaN handling, for example, that I haven't paid much attention to and would probably need to make sure I get right/satisfactory.

chriselrod · 2023-08-14T19:10:41Z

Closing because this PR is no longer necessary (#570 incorporates the change).

KristofferC and others added 3 commits December 13, 2021 10:49

Revert "Merge pull request JuliaDiff#569 from JuliaDiff/kc/revert_simd"

fdc5a72

This reverts commit 43ef860, reversing changes made to 01a056d.

Update Project.toml

e64760e

Merge master, inline retval

23a5626

chriselrod commented Aug 14, 2023

View reviewed changes

KristofferC force-pushed the kc/simd branch from e64760e to 39bbd95 Compare August 14, 2023 18:01

chriselrod closed this Aug 14, 2023

chriselrod deleted the ce/simd branch August 14, 2023 19:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge master, inline retval #655

Merge master, inline retval #655

chriselrod commented Jul 6, 2023

chriselrod commented Jul 30, 2023

chriselrod commented Aug 14, 2023

KristofferC commented Aug 14, 2023

KristofferC commented Aug 14, 2023

chriselrod Aug 14, 2023 •

edited

Loading

KristofferC Aug 14, 2023

chriselrod commented Aug 14, 2023 •

edited

Loading

chriselrod commented Aug 14, 2023

Merge master, inline retval #655

Merge master, inline retval #655

Conversation

chriselrod commented Jul 6, 2023

chriselrod commented Jul 30, 2023

chriselrod commented Aug 14, 2023

KristofferC commented Aug 14, 2023

KristofferC commented Aug 14, 2023

chriselrod Aug 14, 2023 • edited Loading

Choose a reason for hiding this comment

KristofferC Aug 14, 2023

Choose a reason for hiding this comment

chriselrod commented Aug 14, 2023 • edited Loading

chriselrod commented Aug 14, 2023

chriselrod Aug 14, 2023 •

edited

Loading

chriselrod commented Aug 14, 2023 •

edited

Loading