-
Notifications
You must be signed in to change notification settings - Fork 148
Merge master, inline retval #655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Running some internal proprietary benchmark, the latest ForwardDiff release gives me 18.568555 seconds (24.68 M allocations: 1.288 GiB, 2.00% gc time, 2.42% compilation time)
7.463995 seconds (24.35 M allocations: 1.268 GiB, 2.79% gc time, 0.14% compilation time)
18.064215 seconds (24.40 M allocations: 1.270 GiB, 1.72% gc time)
7.450395 seconds (24.35 M allocations: 1.268 GiB, 2.67% gc time) versus this PR 13.992152 seconds (24.72 M allocations: 1.295 GiB, 2.51% gc time, 3.18% compilation time)
6.132790 seconds (24.45 M allocations: 1.277 GiB, 4.11% gc time, 0.18% compilation time)
13.443605 seconds (24.44 M allocations: 1.277 GiB, 2.11% gc time)
6.045382 seconds (24.45 M allocations: 1.277 GiB, 3.19% gc time) The difference is partially because my computer has AVX512. If I start Julia with 15.560895 seconds (24.68 M allocations: 1.288 GiB, 2.31% gc time, 2.70% compilation time)
6.524974 seconds (24.35 M allocations: 1.268 GiB, 3.69% gc time, 0.17% compilation time)
15.263915 seconds (24.40 M allocations: 1.270 GiB, 2.00% gc time)
6.641746 seconds (24.35 M allocations: 1.268 GiB, 3.04% gc time) while this PR is unchanged (as it isn't relying on the autovectorizer). Compile times appear unchanged. I am not sure if retval not being inlined caused the problem we observed before the initial reverting, but I would like to see this move forward. Or I'm just going to give up and create my own ForwardDiff, since duplicated effort tends to be smaller than collaboration's cost. |
I missed this. Wouldn't the "correct" way to do this be to rebase the original branch on master, force push that commit and then add this above. As it is right now, it is kind of unreviewable. |
Hehe, yeah, replicating ForwardDiff is pretty fast. I tried something with Hessians using SIMD.jl in https://github.com/KristofferC/HyperHessians.jl with some decent perf results. |
@inline function dual_definition_retval(::Val{T}, val::Real, deriv::Real, partial::Partials) where {T} | ||
return Dual{T}(val, deriv * partial) | ||
end | ||
@inline function dual_definition_retval(::Val{T}, val::Real, deriv1::Real, partial1::Partials, deriv2::Real, partial2::Partials) where {T} | ||
return Dual{T}(val, _mul_partials(partial1, partial2, deriv1, deriv2)) | ||
end | ||
@inline function dual_definition_retval(::Val{T}, val::Complex, deriv::Union{Real,Complex}, partial::Partials) where {T} | ||
reval, imval = reim(val) | ||
if deriv isa Real | ||
p = deriv * partial | ||
return Complex(Dual{T}(reval, p), Dual{T}(imval, zero(p))) | ||
else | ||
rederiv, imderiv = reim(deriv) | ||
return Complex(Dual{T}(reval, rederiv * partial), Dual{T}(imval, imderiv * partial)) | ||
end | ||
end | ||
@inline function dual_definition_retval(::Val{T}, val::Complex, deriv1::Union{Real,Complex}, partial1::Partials, deriv2::Union{Real,Complex}, partial2::Partials) where {T} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KristofferC adding these 4 @inline
s to dual_definition_retval
should be the only actual change.
I can't rebase your original PR. Mind doing so?
Perhaps it'd be better to rebase on the 0.10 branch than on master (although you could do master if you'd prefer having that be a breaking release for safety reasons).
You're then free to add these 4 @inline
s directly, or seeing if this diff looks cleaner afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the kc/simd
branch, please check it out.
Yeah, it might also be worth look at some of the existing alternative options before actually doing so, since some may be good starting points, especially as Pumas wants 2nd+ derivatives. I think there was another package that was supposed to safe flops by being sparser (i.e., not calculating the same partials multiple times). The worry is always the potentially long tail of polish/corner cases. |
Closing because this PR is no longer necessary (#570 incorporates the change). |
Adding the
@inline
sPrior to that, performance was around 215 microseconds.