-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Labels
Description
Currently we haven't built a benchmark framework to compare the operations in Numba Kernels with RAW CUDA C++. One pending reason why we haven't done so is that we know there will be a performance gap awaiting LTO support. However, this shouldn't be a blocker for us to build the benchmark framework.
There are two aspects of performances that we want to capture by the benchmarks:
- Infrastructure to benchmark the performance gap between releases, this indicates the performance gain we get from optimizing Numbast, Numba, CUDA over time
- Infrastructure to benchmark the gap between native CUDA C++ and Numba kernels. This measures the overhead of the additional wrappers we built with Numbast (extra shim, IRs).