Skip to content

bpo-47189: What's New in 3.11: Faster CPython #32235

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Apr 6, 2022
Merged
Changes from 3 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 34 additions & 33 deletions Doc/whatsnew/3.11.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,8 @@ Summary -- Release highlights
.. This section singles out the most important changes in Python 3.11.
Brevity is key.

- Python 3.11 is 10-60% faster than Python 3.10. On average, we measured a 1.22x
speedup on the standard benchmark suite. See `Faster CPython`_ for details.
- Python 3.11 is up to 10-60% faster than Python 3.10. On average, we measured a
1.22x speedup on the standard benchmark suite. See `Faster CPython`_ for details.

.. PEP-sized items next.

Expand Down Expand Up @@ -495,7 +495,7 @@ CPython 3.11 is on average `1.22x faster <https://github.com/faster-cpython/idea
than CPython 3.10 when measured with the
`pyperformance <https://github.com/python/pyperformance>`_ benchmark suite,
and compiled with GCC on Ubuntu Linux. Depending on your workload, the speedup
could be 10-60% faster.
could be up to 10-60% faster.

This project focuses on two major areas in Python: faster startup and faster
runtime. Other optimizations not under this project are listed in `Optimizations`_.
Expand Down Expand Up @@ -524,8 +524,7 @@ by the interpreter. This reduces the steps in module execution process to this:
Statically allocated code object -> Evaluate

Interpreter startup is now 10-15% faster in Python 3.11. This has a big
impact for short-running programs using Python such as ``python -m venv ...``,
or ``python -m pip ...``.
impact for short-running programs using Python.

(Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in numerous issues.)

Expand All @@ -540,12 +539,12 @@ holds execution information. The following are new frame optimizations:

- Streamlined the frame creation process.
- Avoided memory allocation by generously re-using frame space on the C stack.
- Streamlined the internal frame object to contain only essential information.
- Streamlined the internal frame struct to contain only essential information.
Frames previously held extra debugging and memory management information.

Old-style frames are now created only when required by debuggers. For most
user code, no frames are created at all. As a result, nearly all Python
functions calls have sped up significantly. We measured a 3-7% speedup
Old-style frame objects are now created only when required by debuggers. For
most user code, no frame objects are created at all. As a result, nearly all
Python functions calls have sped up significantly. We measured a 3-7% speedup
in pyperformance.

(Contributed by Mark Shannon in :issue:`44590`.)
Expand All @@ -554,19 +553,19 @@ in pyperformance.

Inlined Python function calls
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
During a Python function call, Python will call a C function to interpret that
function's code, similar to how a user would call ``eval`` to run arbitrary
code.
During a Python function call, Python will call an evaluating C function to
interpret that function's code. This effectively limits pure Python recursion to
what's safe for the C stack

In 3.11, when Python detects code calling another Python function,
In 3.11, when Python detects Python code calling another Python function,
it sets up a new frame, and "jumps" to the new code inside the new frame. This
avoids calling the C interpreting function altogether.

Python function calls now consume almost no C stack space. This speeds up Python
to Python function calls. In simple recursive functions like fibonacci or
Most Python function calls now consume no C stack space. This speeds up
most of such calls. In simple recursive functions like fibonacci or
factorial, a 1.7x speedup was observed. This also means recursive functions
can recurse significantly deeper, assuming the recursion limit and memory limit
is not exceeded. We measured a 1-3% improvement in pyperformance.
can recurse significantly deeper, (if the user increases the recursion limit).
We measured a 1-3% improvement in pyperformance.

(Contributed by Pablo Galindo and Mark Shannon in :issue:`45256`.)

Expand All @@ -586,7 +585,7 @@ Python caches the results of expensive operations directly in the bytecode.
The specializer will also combine certain common instruction pairs into one
superinstruction. This reduces the overhead during execution.

This extra information requires more memory. Python will only specialize
Python will only specialize
when it sees code that is "hot" (executed multiple times). This prevents Python
from wasting time for run-once code. Python can also de-specialize when code is
too dynamic or when the use changes. Specialization is attempted periodically,
Expand All @@ -603,38 +602,40 @@ See :pep:`659` for more information.)
| Operation | Form | Specialization | Operation speedup | Contributor(s) |
| | | | (up to) | |
+===============+====================+=======================================================+===================+===================+
| Binary | ``o+o; o*o; o-o;`` | Binary add, multiply and subtract for common types | 10% | Mark Shannon, |
| Binary | ``x+x; x*x; x-x;`` | Binary add, multiply and subtract for common types | 10% | Mark Shannon, |
| operations | | such as ``int``, ``float``, and ``str`` take custom | | Dong-hee Na, |
| | | fast paths for their underlying types. | | Brandt Bucher, |
| | | | | Dennis Sweeney |
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
| Subscript | ``o[i]`` | Subscripting container types such as ``list``, | 10-30% | Irit Katriel, |
| Subscript | ``a[i]`` | Subscripting container types such as ``list``, | 10-30% | Irit Katriel, |
| | | ``tuple`` and ``dict`` directly index the underlying | | Mark Shannon |
| | | data structures. Subscripting custom ``__getitem__`` | | |
| | | data structures. | | |
| | | | | |
| | | Subscripting custom ``__getitem__`` | | |
| | | is also inlined similar to :ref:`inline-calls`. | | |
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
| Store | ``o[i] = z`` | Similar to subscripting specialization above. | 10-90% | Dennis Sweeney |
| Store | ``a[i] = z`` | Similar to subscripting specialization above. | 10-90% | Dennis Sweeney |
| subscript | | | | |
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
| Calls | ``f(arg)`` | Calls to common builtin (C) functions such as ``len`` | 20% | Mark Shannon, |
| | | and ``isinstance`` directly call their underlying C | | Ken Jin |
| | ``T(arg)`` | version. This avoids going through the internal | 170% | |
| | ``C(arg)`` | version. This avoids going through the internal | 170% | |
| | | calling convention. | | |
| | | | | |
| | | Calls to certain Python functions are inlined similar | | |
| | | to :ref:`inline-calls`. | | |
| | | | | |
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
| Load | ``print; len`` | The object's index in the globals/builtins namespace | - [1]_ | Mark Shannon |
| global | | is cached. Loading globals and builtins require | | |
| Load | ``print`` | The object's index in the globals/builtins namespace | - [1]_ | Mark Shannon |
| global | ``len`` | is cached. Loading globals and builtins require | | |
| variable | | zero namespace lookups. | | |
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
| Load | ``o.attr`` | Similar to loading global variables. The attribute's | - [2]_ | Mark Shannon |
| attribute | | index inside the class/object's namespace is cached. | | |
| | | In most cases, attribute loading will require zero | | |
| | | namespace lookups. | | |
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
| Load | ``o.meth()`` | The actual address of the method is cached. Method | 10-20% [3]_ | Ken Jin, |
| Load | ``o.meth()`` | The actual address of the method is cached. Method | 10-20% | Ken Jin, |
| methods for | | loading now has no namespace lookups -- even for | | Mark Shannon |
| call | | classes with long inheritance chains. | | |
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
Expand All @@ -652,10 +653,6 @@ See :pep:`659` for more information.)
3.11 specializes for more forms. Furthermore, all attribute loads should
be sped up by :issue:`45947`.

.. [3] Classes with longer inheritance chains will see greater speedups.
This optimization effectively makes method lookup constant time
regardless of inheritance.


Misc
----
Expand All @@ -664,6 +661,9 @@ Misc
namespace dictionaries now also share keys more freely.
(Contributed Mark Shannon in :issue:`45340` and :issue:`40116`.)

* A more concise representation of exceptions in the interpreter reduced the
time required for catching an exception by about 10%.
(Contributed by Irit Katriel in :issue:`45711`.)

FAQ
---
Expand All @@ -677,7 +677,7 @@ FAQ
|
| Q: Will CPython 3.11 use more memory?
|
| A: Maybe not. We don't expect memory use to exceed 20% more versus 3.10.
| A: Maybe not. We don't expect memory use to exceed 20% more than 3.10.
This is offset by memory optimizations for frame objects and object
dictionaries as mentioned above.
|
Expand All @@ -703,8 +703,9 @@ About
-----

Faster CPython explores optimizations for :term:`CPython`. The main team is
funded by Microsoft to work on this full-time. The team also collaborates
extensively with volunteer contributors in the community.
funded by Microsoft to work on this full-time. Pablo Galindo Salgado is also
funded by Bloomberg LP to work on the project part-time. Finally, many
contributors are volunteers from the community.


CPython bytecode changes
Expand Down