Skip to content

Commit df3ed54

Browse files
[3.11] gh-95913: Edit Faster CPython section in 3.11 WhatsNew (GH-98429) (GH-102490)
gh-95913: Edit Faster CPython section in 3.11 WhatsNew (GH-98429) (cherry picked from commit 80b19a3) Co-authored-by: C.A.M. Gerlach <[email protected]>
1 parent b6fd4e6 commit df3ed54

File tree

1 file changed

+109
-77
lines changed

1 file changed

+109
-77
lines changed

Doc/whatsnew/3.11.rst

+109-77
Original file line numberDiff line numberDiff line change
@@ -1319,14 +1319,17 @@ This section covers specific optimizations independent of the
13191319
Faster CPython
13201320
==============
13211321

1322-
CPython 3.11 is on average `25% faster <https://github.com/faster-cpython/ideas#published-results>`_
1323-
than CPython 3.10 when measured with the
1322+
CPython 3.11 is an average of
1323+
`25% faster <https://github.com/faster-cpython/ideas#published-results>`_
1324+
than CPython 3.10 as measured with the
13241325
`pyperformance <https://github.com/python/pyperformance>`_ benchmark suite,
1325-
and compiled with GCC on Ubuntu Linux. Depending on your workload, the speedup
1326-
could be up to 10-60% faster.
1326+
when compiled with GCC on Ubuntu Linux.
1327+
Depending on your workload, the overall speedup could be 10-60%.
13271328

1328-
This project focuses on two major areas in Python: faster startup and faster
1329-
runtime. Other optimizations not under this project are listed in `Optimizations`_.
1329+
This project focuses on two major areas in Python:
1330+
:ref:`whatsnew311-faster-startup` and :ref:`whatsnew311-faster-runtime`.
1331+
Optimizations not covered by this project are listed separately under
1332+
:ref:`whatsnew311-optimizations`.
13301333

13311334

13321335
.. _whatsnew311-faster-startup:
@@ -1339,8 +1342,8 @@ Faster Startup
13391342
Frozen imports / Static code objects
13401343
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13411344

1342-
Python caches bytecode in the :ref:`__pycache__<tut-pycache>` directory to
1343-
speed up module loading.
1345+
Python caches :term:`bytecode` in the :ref:`__pycache__ <tut-pycache>`
1346+
directory to speed up module loading.
13441347

13451348
Previously in 3.10, Python module execution looked like this:
13461349

@@ -1349,8 +1352,9 @@ Previously in 3.10, Python module execution looked like this:
13491352
Read __pycache__ -> Unmarshal -> Heap allocated code object -> Evaluate
13501353
13511354
In Python 3.11, the core modules essential for Python startup are "frozen".
1352-
This means that their code objects (and bytecode) are statically allocated
1353-
by the interpreter. This reduces the steps in module execution process to this:
1355+
This means that their :ref:`codeobjects` (and bytecode)
1356+
are statically allocated by the interpreter.
1357+
This reduces the steps in module execution process to:
13541358

13551359
.. code-block:: text
13561360
@@ -1359,7 +1363,7 @@ by the interpreter. This reduces the steps in module execution process to this:
13591363
Interpreter startup is now 10-15% faster in Python 3.11. This has a big
13601364
impact for short-running programs using Python.
13611365

1362-
(Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in numerous issues.)
1366+
(Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in many issues.)
13631367

13641368

13651369
.. _whatsnew311-faster-runtime:
@@ -1372,17 +1376,19 @@ Faster Runtime
13721376
Cheaper, lazy Python frames
13731377
^^^^^^^^^^^^^^^^^^^^^^^^^^^
13741378

1375-
Python frames are created whenever Python calls a Python function. This frame
1376-
holds execution information. The following are new frame optimizations:
1379+
Python frames, holding execution information,
1380+
are created whenever Python calls a Python function.
1381+
The following are new frame optimizations:
13771382

13781383
- Streamlined the frame creation process.
13791384
- Avoided memory allocation by generously re-using frame space on the C stack.
13801385
- Streamlined the internal frame struct to contain only essential information.
13811386
Frames previously held extra debugging and memory management information.
13821387

1383-
Old-style frame objects are now created only when requested by debuggers or
1384-
by Python introspection functions such as ``sys._getframe`` or
1385-
``inspect.currentframe``. For most user code, no frame objects are
1388+
Old-style :ref:`frame objects <frame-objects>`
1389+
are now created only when requested by debuggers
1390+
or by Python introspection functions such as :func:`sys._getframe` and
1391+
:func:`inspect.currentframe`. For most user code, no frame objects are
13861392
created at all. As a result, nearly all Python functions calls have sped
13871393
up significantly. We measured a 3-7% speedup in pyperformance.
13881394

@@ -1403,10 +1409,11 @@ In 3.11, when CPython detects Python code calling another Python function,
14031409
it sets up a new frame, and "jumps" to the new code inside the new frame. This
14041410
avoids calling the C interpreting function altogether.
14051411

1406-
Most Python function calls now consume no C stack space. This speeds up
1407-
most of such calls. In simple recursive functions like fibonacci or
1408-
factorial, a 1.7x speedup was observed. This also means recursive functions
1409-
can recurse significantly deeper (if the user increases the recursion limit).
1412+
Most Python function calls now consume no C stack space, speeding them up.
1413+
In simple recursive functions like fibonacci or
1414+
factorial, we observed a 1.7x speedup. This also means recursive functions
1415+
can recurse significantly deeper
1416+
(if the user increases the recursion limit with :func:`sys.setrecursionlimit`).
14101417
We measured a 1-3% improvement in pyperformance.
14111418

14121419
(Contributed by Pablo Galindo and Mark Shannon in :issue:`45256`.)
@@ -1417,7 +1424,7 @@ We measured a 1-3% improvement in pyperformance.
14171424
PEP 659: Specializing Adaptive Interpreter
14181425
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14191426

1420-
:pep:`659` is one of the key parts of the faster CPython project. The general
1427+
:pep:`659` is one of the key parts of the Faster CPython project. The general
14211428
idea is that while Python is a dynamic language, most code has regions where
14221429
objects and types rarely change. This concept is known as *type stability*.
14231430

@@ -1426,17 +1433,18 @@ in the executing code. Python will then replace the current operation with a
14261433
more specialized one. This specialized operation uses fast paths available only
14271434
to those use cases/types, which generally outperform their generic
14281435
counterparts. This also brings in another concept called *inline caching*, where
1429-
Python caches the results of expensive operations directly in the bytecode.
1436+
Python caches the results of expensive operations directly in the
1437+
:term:`bytecode`.
14301438

14311439
The specializer will also combine certain common instruction pairs into one
1432-
superinstruction. This reduces the overhead during execution.
1440+
superinstruction, reducing the overhead during execution.
14331441

14341442
Python will only specialize
14351443
when it sees code that is "hot" (executed multiple times). This prevents Python
1436-
from wasting time for run-once code. Python can also de-specialize when code is
1444+
from wasting time on run-once code. Python can also de-specialize when code is
14371445
too dynamic or when the use changes. Specialization is attempted periodically,
1438-
and specialization attempts are not too expensive. This allows specialization
1439-
to adapt to new circumstances.
1446+
and specialization attempts are not too expensive,
1447+
allowing specialization to adapt to new circumstances.
14401448

14411449
(PEP written by Mark Shannon, with ideas inspired by Stefan Brunthaler.
14421450
See :pep:`659` for more information. Implementation by Mark Shannon and Brandt
@@ -1449,32 +1457,32 @@ Bucher, with additional help from Irit Katriel and Dennis Sweeney.)
14491457
| Operation | Form | Specialization | Operation speedup | Contributor(s) |
14501458
| | | | (up to) | |
14511459
+===============+====================+=======================================================+===================+===================+
1452-
| Binary | ``x+x; x*x; x-x;`` | Binary add, multiply and subtract for common types | 10% | Mark Shannon, |
1453-
| operations | | such as ``int``, ``float``, and ``str`` take custom | | Dong-hee Na, |
1454-
| | | fast paths for their underlying types. | | Brandt Bucher, |
1460+
| Binary | ``x + x`` | Binary add, multiply and subtract for common types | 10% | Mark Shannon, |
1461+
| operations | | such as :class:`int`, :class:`float` and :class:`str` | | Dong-hee Na, |
1462+
| | ``x - x`` | take custom fast paths for their underlying types. | | Brandt Bucher, |
14551463
| | | | | Dennis Sweeney |
1464+
| | ``x * x`` | | | |
14561465
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
1457-
| Subscript | ``a[i]`` | Subscripting container types such as ``list``, | 10-25% | Irit Katriel, |
1458-
| | | ``tuple`` and ``dict`` directly index the underlying | | Mark Shannon |
1459-
| | | data structures. | | |
1466+
| Subscript | ``a[i]`` | Subscripting container types such as :class:`list`, | 10-25% | Irit Katriel, |
1467+
| | | :class:`tuple` and :class:`dict` directly index | | Mark Shannon |
1468+
| | | the underlying data structures. | | |
14601469
| | | | | |
1461-
| | | Subscripting custom ``__getitem__`` | | |
1470+
| | | Subscripting custom :meth:`~object.__getitem__` | | |
14621471
| | | is also inlined similar to :ref:`inline-calls`. | | |
14631472
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
14641473
| Store | ``a[i] = z`` | Similar to subscripting specialization above. | 10-25% | Dennis Sweeney |
14651474
| subscript | | | | |
14661475
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
14671476
| Calls | ``f(arg)`` | Calls to common builtin (C) functions and types such | 20% | Mark Shannon, |
1468-
| | ``C(arg)`` | as ``len`` and ``str`` directly call their underlying | | Ken Jin |
1469-
| | | C version. This avoids going through the internal | | |
1470-
| | | calling convention. | | |
1471-
| | | | | |
1477+
| | | as :func:`len` and :class:`str` directly call their | | Ken Jin |
1478+
| | ``C(arg)`` | underlying C version. This avoids going through the | | |
1479+
| | | internal calling convention. | | |
14721480
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
1473-
| Load | ``print`` | The object's index in the globals/builtins namespace | [1]_ | Mark Shannon |
1474-
| global | ``len`` | is cached. Loading globals and builtins require | | |
1475-
| variable | | zero namespace lookups. | | |
1481+
| Load | ``print`` | The object's index in the globals/builtins namespace | [#load-global]_ | Mark Shannon |
1482+
| global | | is cached. Loading globals and builtins require | | |
1483+
| variable | ``len`` | zero namespace lookups. | | |
14761484
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
1477-
| Load | ``o.attr`` | Similar to loading global variables. The attribute's | [2]_ | Mark Shannon |
1485+
| Load | ``o.attr`` | Similar to loading global variables. The attribute's | [#load-attr]_ | Mark Shannon |
14781486
| attribute | | index inside the class/object's namespace is cached. | | |
14791487
| | | In most cases, attribute loading will require zero | | |
14801488
| | | namespace lookups. | | |
@@ -1486,14 +1494,15 @@ Bucher, with additional help from Irit Katriel and Dennis Sweeney.)
14861494
| Store | ``o.attr = z`` | Similar to load attribute optimization. | 2% | Mark Shannon |
14871495
| attribute | | | in pyperformance | |
14881496
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
1489-
| Unpack | ``*seq`` | Specialized for common containers such as ``list`` | 8% | Brandt Bucher |
1490-
| Sequence | | and ``tuple``. Avoids internal calling convention. | | |
1497+
| Unpack | ``*seq`` | Specialized for common containers such as | 8% | Brandt Bucher |
1498+
| Sequence | | :class:`list` and :class:`tuple`. | | |
1499+
| | | Avoids internal calling convention. | | |
14911500
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
14921501

1493-
.. [1] A similar optimization already existed since Python 3.8. 3.11
1494-
specializes for more forms and reduces some overhead.
1502+
.. [#load-global] A similar optimization already existed since Python 3.8.
1503+
3.11 specializes for more forms and reduces some overhead.
14951504
1496-
.. [2] A similar optimization already existed since Python 3.10.
1505+
.. [#load-attr] A similar optimization already existed since Python 3.10.
14971506
3.11 specializes for more forms. Furthermore, all attribute loads should
14981507
be sped up by :issue:`45947`.
14991508
@@ -1503,49 +1512,72 @@ Bucher, with additional help from Irit Katriel and Dennis Sweeney.)
15031512
Misc
15041513
----
15051514

1506-
* Objects now require less memory due to lazily created object namespaces. Their
1507-
namespace dictionaries now also share keys more freely.
1515+
* Objects now require less memory due to lazily created object namespaces.
1516+
Their namespace dictionaries now also share keys more freely.
15081517
(Contributed Mark Shannon in :issue:`45340` and :issue:`40116`.)
15091518

1519+
* "Zero-cost" exceptions are implemented, eliminating the cost
1520+
of :keyword:`try` statements when no exception is raised.
1521+
(Contributed by Mark Shannon in :issue:`40222`.)
1522+
15101523
* A more concise representation of exceptions in the interpreter reduced the
15111524
time required for catching an exception by about 10%.
15121525
(Contributed by Irit Katriel in :issue:`45711`.)
15131526

1527+
* :mod:`re`'s regular expression matching engine has been partially refactored,
1528+
and now uses computed gotos (or "threaded code") on supported platforms. As a
1529+
result, Python 3.11 executes the `pyperformance regular expression benchmarks
1530+
<https://pyperformance.readthedocs.io/benchmarks.html#regex-dna>`_ up to 10%
1531+
faster than Python 3.10.
1532+
(Contributed by Brandt Bucher in :gh:`91404`.)
1533+
15141534

15151535
.. _whatsnew311-faster-cpython-faq:
15161536

15171537
FAQ
15181538
---
15191539

1520-
| Q: How should I write my code to utilize these speedups?
1521-
|
1522-
| A: You don't have to change your code. Write Pythonic code that follows common
1523-
best practices. The Faster CPython project optimizes for common code
1524-
patterns we observe.
1525-
|
1526-
|
1527-
| Q: Will CPython 3.11 use more memory?
1528-
|
1529-
| A: Maybe not. We don't expect memory use to exceed 20% more than 3.10.
1530-
This is offset by memory optimizations for frame objects and object
1531-
dictionaries as mentioned above.
1532-
|
1533-
|
1534-
| Q: I don't see any speedups in my workload. Why?
1535-
|
1536-
| A: Certain code won't have noticeable benefits. If your code spends most of
1537-
its time on I/O operations, or already does most of its
1538-
computation in a C extension library like numpy, there won't be significant
1539-
speedup. This project currently benefits pure-Python workloads the most.
1540-
|
1541-
| Furthermore, the pyperformance figures are a geometric mean. Even within the
1542-
pyperformance benchmarks, certain benchmarks have slowed down slightly, while
1543-
others have sped up by nearly 2x!
1544-
|
1545-
|
1546-
| Q: Is there a JIT compiler?
1547-
|
1548-
| A: No. We're still exploring other optimizations.
1540+
.. _faster-cpython-faq-my-code:
1541+
1542+
How should I write my code to utilize these speedups?
1543+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1544+
1545+
Write Pythonic code that follows common best practices;
1546+
you don't have to change your code.
1547+
The Faster CPython project optimizes for common code patterns we observe.
1548+
1549+
1550+
.. _faster-cpython-faq-memory:
1551+
1552+
Will CPython 3.11 use more memory?
1553+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1554+
1555+
Maybe not; we don't expect memory use to exceed 20% higher than 3.10.
1556+
This is offset by memory optimizations for frame objects and object
1557+
dictionaries as mentioned above.
1558+
1559+
1560+
.. _faster-cpython-ymmv:
1561+
1562+
I don't see any speedups in my workload. Why?
1563+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1564+
1565+
Certain code won't have noticeable benefits. If your code spends most of
1566+
its time on I/O operations, or already does most of its
1567+
computation in a C extension library like NumPy, there won't be significant
1568+
speedups. This project currently benefits pure-Python workloads the most.
1569+
1570+
Furthermore, the pyperformance figures are a geometric mean. Even within the
1571+
pyperformance benchmarks, certain benchmarks have slowed down slightly, while
1572+
others have sped up by nearly 2x!
1573+
1574+
1575+
.. _faster-cpython-jit:
1576+
1577+
Is there a JIT compiler?
1578+
^^^^^^^^^^^^^^^^^^^^^^^^
1579+
1580+
No. We're still exploring other optimizations.
15491581

15501582

15511583
.. _whatsnew311-faster-cpython-about:

0 commit comments

Comments
 (0)