[Experiment] Switching from pybind11 to nanobind for function call overhead improvements #3

TkTech · 2024-03-17T21:16:11Z

Switching from pybind11 to nanobind offers some performance improvements with minimal code changes. Our new benchmarks are:

------------------------------------------------------------------------------------- benchmark: 3 tests ------------------------------------------------------------------------------------
Name (time in ms)              Min                 Max                Mean            StdDev              Median               IQR            Outliers      OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_can_ada_parse         38.3641 (1.0)       38.7200 (1.0)       38.5535 (1.0)      0.0861 (1.0)       38.5595 (1.0)      0.1098 (1.0)           9;0  25.9380 (1.0)          26           1
test_ada_python_parse     111.0045 (2.89)     111.3101 (2.87)     111.1474 (2.88)     0.1099 (1.28)     111.1436 (2.88)     0.1624 (1.48)          4;0   8.9971 (0.35)         10           1
test_urllib_parse         255.1016 (6.65)     275.0980 (7.10)     259.3193 (6.73)     8.8238 (102.44)   255.5814 (6.63)     5.3559 (48.77)         1;1   3.8562 (0.15)          5           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

I'm routinely seeing 6-7x better performance over urllib, and significantly improved performance when actually using the results (ie accessing result.pathname) due to lowered attribute access overhead.

However, this introduces CMake as a build time dependency, and reduces the available targets (CPython 3.8+, PyPy > 3.8). Have not yet found a way to eliminate CMake as a dependency. I don't really mind if we only target newer versions of Python.

@lemire @wjakob

wjakob · 2024-03-19T08:54:11Z

@TkTech Did you see the instructions I included here? https://github.com/wjakob/nanobind/blob/master/src/nb_combined.cpp. This should allow you to compile with essentially any other kind of build system, though some work will be needed to replicate all the bells and whistles of what nanobind's cmake tooling provides out of the box. Out of curiosity, what's the relative speedup over the previous pybind11-based version?

TkTech · 2024-03-19T16:45:17Z

@wjakob That's fantastic, I'll give it a full read this weekend and give it a try.

Relative speedup is 30-33%.

anonrig

Amazing.

raceychan · 2025-03-07T05:12:44Z

Any update on this? I'd like to see a switch to nanobind. I was going to implement a cython version but if there is a nanobind version then there is no need since it's pretty much as fast as cython.

Let me know if I can help!

owocado · 2025-06-16T09:46:12Z

Any update on this? I'd like to see a switch to nanobind. I was going to implement a cython version but if there is a nanobind version then there is no need since it's pretty much as fast as cython.

Let me know if I can help!

I think the latest Cython with new improvements should give just as much of speedups and performance as nanobind and wouldn't require CMake as hard dependency, not 100% sure because we will need to test this in practice.

And I would love to see updates on this as well. :D

owocado · 2025-06-16T11:46:35Z

Hm I built the package locally with nanobind based off this PR + pulled missing changes from latest and ran benchmarks and this is what I got:

This benchmarck was run on Python 3.12.6 on Ubuntu 24.04.2 aarch64 machine:

============================================================================================== test session starts ==============================================================================================
platform linux -- Python 3.12.6, pytest-8.4.0, pluggy-1.6.0
benchmark: 5.1.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/eon/GitHub/can_ada
configfile: pyproject.toml
plugins: benchmark-5.1.0
collected 16 items

tests/test_benchmark.py ....                                                                                                                                                                              [ 25%]
tests/test_idna.py ..                                                                                                                                                                                     [ 37%]
tests/test_misc.py .                                                                                                                                                                                      [ 43%]
tests/test_parsing.py ...                                                                                                                                                                                 [ 62%]
tests/test_search.py ......                                                                                                                                                                               [100%]


------------------------------------------------------------------------------------- benchmark: 4 tests ------------------------------------------------------------------------------------
Name (time in ms)              Min                 Max                Mean            StdDev              Median               IQR            Outliers      OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_can_ada_parse         74.2095 (1.0)       77.3226 (1.0)       76.2115 (1.0)      0.9218 (1.0)       76.4414 (1.0)      0.5105 (1.0)           4;2  13.1214 (1.0)          13           1
test_ada_python_parse     244.2262 (3.29)     252.6127 (3.27)     247.8758 (3.25)     3.4034 (3.69)     246.5964 (3.23)     5.2646 (10.31)         2;0   4.0343 (0.31)          5           1
test_yarl_parse           392.9242 (5.29)     404.8161 (5.24)     398.3661 (5.23)     4.5408 (4.93)     399.1666 (5.22)     6.1517 (12.05)         2;0   2.5103 (0.19)          5           1
test_urllib_parse         518.3281 (6.98)     526.5334 (6.81)     524.5661 (6.88)     3.5079 (3.81)     525.9881 (6.88)     2.6860 (5.26)          1;1   1.9063 (0.15)          5           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
============================================================================================== 16 passed in 10.43s =============================================================================================

This benchmarch was done on Python 3.13.3 on ArchLinux x64 machine:

============================================================================================== test session starts ==============================================================================================
platform linux -- Python 3.13.3, pytest-8.4.0, pluggy-1.6.0
benchmark: 5.1.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /media/Data/GitHub/can_ada
configfile: pyproject.toml
plugins: benchmark-5.1.0
collected 16 items

tests/test_benchmark.py ....                                                                                                                                                                              [ 25%]
tests/test_idna.py ..                                                                                                                                                                                     [ 37%]
tests/test_misc.py .                                                                                                                                                                                      [ 43%]
tests/test_parsing.py ...                                                                                                                                                                                 [ 62%]
tests/test_search.py ......                                                                                                                                                                               [100%]


------------------------------------------------------------------------------------- benchmark: 4 tests ------------------------------------------------------------------------------------
Name (time in ms)              Min                 Max                Mean            StdDev              Median               IQR            Outliers      OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_can_ada_parse         44.0162 (1.0)       49.4297 (1.0)       44.9846 (1.0)      1.4984 (1.11)      44.3686 (1.0)      0.8665 (1.0)           3;3  22.2298 (1.0)          20           1
test_ada_python_parse     139.0202 (3.16)     146.0516 (2.95)     141.0748 (3.14)     2.8255 (2.09)     139.7587 (3.15)     3.8139 (4.40)          2;0   7.0884 (0.32)          7           1
test_yarl_parse           267.3834 (6.07)     273.6444 (5.54)     269.1717 (5.98)     2.6015 (1.93)     268.1252 (6.04)     2.8348 (3.27)          1;0   3.7151 (0.17)          5           1
test_urllib_parse         307.3849 (6.98)     310.5946 (6.28)     309.1844 (6.87)     1.3497 (1.0)      308.8693 (6.96)     2.1906 (2.53)          2;0   3.2343 (0.15)          5           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
============================================================================================== 16 passed in 7.42s ===============================================================================================

though I had to do some changes in src/bindings.cpp to incorporate the latest nanobind version otherwise cmake --build build were failing 😄

TkTech added 2 commits March 17, 2024 17:10

Switching from pybind11 to nanobind.

4794c8c

Closes #4, closes #5. Patch version bump.

106d420

anonrig approved these changes Apr 9, 2024

View reviewed changes

owocado approved these changes Jun 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Experiment] Switching from pybind11 to nanobind for function call overhead improvements #3

[Experiment] Switching from pybind11 to nanobind for function call overhead improvements #3

Uh oh!

TkTech commented Mar 17, 2024 •

edited

Loading

Uh oh!

wjakob commented Mar 19, 2024 •

edited

Loading

Uh oh!

TkTech commented Mar 19, 2024 •

edited

Loading

Uh oh!

anonrig left a comment

Uh oh!

raceychan commented Mar 7, 2025 •

edited

Loading

Uh oh!

owocado commented Jun 16, 2025

Uh oh!

owocado commented Jun 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

[Experiment] Switching from pybind11 to nanobind for function call overhead improvements #3

Are you sure you want to change the base?

[Experiment] Switching from pybind11 to nanobind for function call overhead improvements #3

Uh oh!

Conversation

TkTech commented Mar 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wjakob commented Mar 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TkTech commented Mar 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anonrig left a comment

Choose a reason for hiding this comment

Uh oh!

raceychan commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

owocado commented Jun 16, 2025

Uh oh!

owocado commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

TkTech commented Mar 17, 2024 •

edited

Loading

wjakob commented Mar 19, 2024 •

edited

Loading

TkTech commented Mar 19, 2024 •

edited

Loading

raceychan commented Mar 7, 2025 •

edited

Loading

owocado commented Jun 16, 2025 •

edited

Loading