-
Notifications
You must be signed in to change notification settings - Fork 243
Description
Is your feature request related to a problem? Please describe.
To get kernel performance matching clang
we have had to add fast-math flags such as contract
(which clang
and nvcc
do by default). Currently, we do this by an ugly-hack, see for example
Lines 21 to 57 in bb37b50
# HACK: module-local versions of core arithmetic; needed to get FMA | |
for (jlf, f) in zip((:+, :*, :-), (:add, :mul, :sub)) | |
for (T, llvmT) in ((:Float32, "float"), (:Float64, "double")) | |
ir = """ | |
%x = f$f contract nsz $llvmT %0, %1 | |
ret $llvmT %x | |
""" | |
@eval begin | |
# the @pure is necessary so that we can constant propagate. | |
@inline Base.@pure function $jlf(a::$T, b::$T) | |
Base.llvmcall($ir, $T, Tuple{$T, $T}, a, b) | |
end | |
end | |
end | |
@eval function $jlf(args...) | |
Base.$jlf(args...) | |
end | |
end | |
let (jlf, f) = (:div_arcp, :div) | |
for (T, llvmT) in ((:Float32, "float"), (:Float64, "double")) | |
ir = """ | |
%x = f$f fast $llvmT %0, %1 | |
ret $llvmT %x | |
""" | |
@eval begin | |
# the @pure is necessary so that we can constant propagate. | |
@inline Base.@pure function $jlf(a::$T, b::$T) | |
Base.llvmcall($ir, $T, Tuple{$T, $T}, a, b) | |
end | |
end | |
end | |
@eval function $jlf(args...) | |
Base.$jlf(args...) | |
end | |
end | |
rcp(x) = div_arcp(one(x), x) # still leads to rcp.rn which is also a function call |
Describe the solution you'd like
I would like a macro like @fastmath
that had fine-grained control over the fast-math flags.
Describe alternatives you've considered
KernelAbstractions used to do this with https://github.com/JuliaLabs/Cassette.jl and other people use macros (although it opens up less optimization and thus not desired) https://github.com/JuliaLabs/Cassette.jl. I don't know if https://github.com/JuliaDebug/CassetteOverlay.jl can be used with kernels but it might be a possible way to implement this.
It would be nice if this functionality eventually got added to base julia.