Skip to content

Calling a c++ function is roughly 2 times slower then calling a native python function #2005

Closed
@tarasko

Description

@tarasko

Issue description

I tried to build a simple example with a function that adds two integer. I was surprised that calling such a function is roughly 2 times slower than calling its pure python counterpart. Measuring it with valgrind I realized that dispatcher incurs quite a lot of additional cost, significant portion of it comes from new/deletes done by std::vector's on every call, but the logic itself is also complex and not cheap. I tried to replace std::vector with the boost::container::small_vector and it helped but it's still not on par with the pure python implementation

struct function_call {
...
    /// Arguments passed to the function:
    boost::container::small_vector<handle, 2> args;

    /// The `convert` value the arguments should be loaded with
    boost::container::small_vector<bool, 2> args_convert;

Any ideas how this can be improved? Cython also generates code that is faster than pybind11 and is on par with pure python

Reproducible example code

#include <pybind11/pybind11.h>

namespace py = pybind11;

__attribute__((noinline)) int simple(int a, int b) { return a + b; }

PYBIND11_MODULE(example_plugin, m) {
    m.doc() = "pybind11 example plugin"; // optional module docstring

    m.def("simple", &simple);
}
[ 50%] Building CXX object CMakeFiles/example_plugin.dir/main.cpp.o
/usr/bin/g++  -Dexample_plugin_EXPORTS -I/home/taras/example_plugin/pybind11/include -I/usr/include/python3.7m  -O2 -g -DNDEBUG -fPIC -fvisibility=hidden   -std=c++17 -flto -fno-fat-lto-objects -o CMakeFiles/example_plugin.dir/main.cpp.o -c /home/taras/example_plugin/main.cpp
[100%] Linking CXX shared module example_plugin.cpython-37m-x86_64-linux-gnu.so
/usr/bin/cmake -E cmake_link_script CMakeFiles/example_plugin.dir/link.txt --verbose=1
/usr/bin/g++ -fPIC -O2 -g -DNDEBUG  -shared  -o example_plugin.cpython-37m-x86_64-linux-gnu.so CMakeFiles/example_plugin.dir/main.cpp.o -flto 
# using std::vector
from example_plugin import simple as simple_cpp
%timeit simple_cpp(42, 94)
496 ns ± 20.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# using boost::container::small_vector
from example_plugin import simple as simple_cpp
%timeit simple_cpp(42, 94)
382 ns ± 15.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
def simple_pure_python(a, b):
    return a+b

%timeit simple_pure_python(42, 94)
260 ns ± 11.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions