Closed
Description
Issue description
I tried to build a simple example with a function that adds two integer. I was surprised that calling such a function is roughly 2 times slower than calling its pure python counterpart. Measuring it with valgrind I realized that dispatcher incurs quite a lot of additional cost, significant portion of it comes from new/deletes done by std::vector's on every call, but the logic itself is also complex and not cheap. I tried to replace std::vector with the boost::container::small_vector and it helped but it's still not on par with the pure python implementation
struct function_call {
...
/// Arguments passed to the function:
boost::container::small_vector<handle, 2> args;
/// The `convert` value the arguments should be loaded with
boost::container::small_vector<bool, 2> args_convert;
Any ideas how this can be improved? Cython also generates code that is faster than pybind11 and is on par with pure python
Reproducible example code
#include <pybind11/pybind11.h>
namespace py = pybind11;
__attribute__((noinline)) int simple(int a, int b) { return a + b; }
PYBIND11_MODULE(example_plugin, m) {
m.doc() = "pybind11 example plugin"; // optional module docstring
m.def("simple", &simple);
}
[ 50%] Building CXX object CMakeFiles/example_plugin.dir/main.cpp.o
/usr/bin/g++ -Dexample_plugin_EXPORTS -I/home/taras/example_plugin/pybind11/include -I/usr/include/python3.7m -O2 -g -DNDEBUG -fPIC -fvisibility=hidden -std=c++17 -flto -fno-fat-lto-objects -o CMakeFiles/example_plugin.dir/main.cpp.o -c /home/taras/example_plugin/main.cpp
[100%] Linking CXX shared module example_plugin.cpython-37m-x86_64-linux-gnu.so
/usr/bin/cmake -E cmake_link_script CMakeFiles/example_plugin.dir/link.txt --verbose=1
/usr/bin/g++ -fPIC -O2 -g -DNDEBUG -shared -o example_plugin.cpython-37m-x86_64-linux-gnu.so CMakeFiles/example_plugin.dir/main.cpp.o -flto
# using std::vector
from example_plugin import simple as simple_cpp
%timeit simple_cpp(42, 94)
496 ns ± 20.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# using boost::container::small_vector
from example_plugin import simple as simple_cpp
%timeit simple_cpp(42, 94)
382 ns ± 15.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
def simple_pure_python(a, b):
return a+b
%timeit simple_pure_python(42, 94)
260 ns ± 11.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Metadata
Metadata
Assignees
Labels
No labels