-
Notifications
You must be signed in to change notification settings - Fork 148
explicitly SIMD muladd with duals #560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth having Partial
s of a SIMDType
wrap a NTuple{N,Core.VecElement}
s to begin with, to avoid all the array <-> vector conversions?
This would change the underlying memory layout:
julia> for i ∈ 1:12; @show i, sizeof(NTuple{i,Core.VecElement{Float64}}); end
(i, sizeof(NTuple{i, Core.VecElement{Float64}})) = (1, 8)
(i, sizeof(NTuple{i, Core.VecElement{Float64}})) = (2, 16)
(i, sizeof(NTuple{i, Core.VecElement{Float64}})) = (3, 32)
(i, sizeof(NTuple{i, Core.VecElement{Float64}})) = (4, 32)
(i, sizeof(NTuple{i, Core.VecElement{Float64}})) = (5, 64)
(i, sizeof(NTuple{i, Core.VecElement{Float64}})) = (6, 64)
(i, sizeof(NTuple{i, Core.VecElement{Float64}})) = (7, 64)
(i, sizeof(NTuple{i, Core.VecElement{Float64}})) = (8, 64)
(i, sizeof(NTuple{i, Core.VecElement{Float64}})) = (9, 128)
(i, sizeof(NTuple{i, Core.VecElement{Float64}})) = (10, 128)
(i, sizeof(NTuple{i, Core.VecElement{Float64}})) = (11, 128)
(i, sizeof(NTuple{i, Core.VecElement{Float64}})) = (12, 128)
vs the array
case, but this may also lead to more efficient codegen.
I think it's also worth rethinking/experimenting with different default chunk sizes at this point.
Additionally, I'd consider some Overall, I'm a big fan of these changes (thanks @YingboMa ). |
It's worth trying. My guess is that there is a lot of code relying that has dug into the internals of ForwardDiff and assumes it will be a number though. |
Hopefully most of that code is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might need a N=0
specialization.
977bbcd
to
d053bd7
Compare
d053bd7
to
5bb4546
Compare
Not sure how often this is encountered in the wild.
PR (2 vectorized 1 scalar madds):
Master: (6 scalar fmadd).
I didn't do the one where one argument is scalar because it seems SLP gets to that.