Skip to content

Add return const instruction #101632

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
penguin-wwy opened this issue Feb 7, 2023 · 2 comments
Closed

Add return const instruction #101632

penguin-wwy opened this issue Feb 7, 2023 · 2 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@penguin-wwy
Copy link
Contributor

penguin-wwy commented Feb 7, 2023

From the pystats doc (pystats-2023-02-05-python-5a2b984.md), I find that LOAD_CONST + RETURN_VALUE is a very high frequency (Because the default return of the function is None).

Successors for LOAD_CONST

Successors Count Percentage
RETURN_VALUE 969,173,651 21.8%
BINARY_OP_ADD_INT 418,647,997 9.4%
LOAD_CONST 403,185,774 9.1%
COMPARE_AND_BRANCH_INT 314,633,792 7.1%
STORE_FAST 295,563,626 6.6%

And predecessors for RETURN_VALUE

Predecessors Count Percentage
LOAD_CONST 969,173,651 29.9%
LOAD_FAST 505,933,343 15.6%
RETURN_VALUE 382,698,373 11.8%
BUILD_TUPLE 328,532,240 10.1%
COMPARE_OP 107,210,803 3.3%

This means that if we add a RETURN_CONST, we can reduce the RETURN_VALUE instruction by 30% and the LOAD_CONST instruction by 20%.

./bin/python3 -m pyperf timeit -w 3 --compare-to ../python-3.12/bin/python3 -s "
def test():
    return 10000
" "test()"

/python-3.12/bin/python3: ..................... 27.0 ns +- 0.3 ns
/cpython/bin/python3: ..................... 25.0 ns +- 0.5 ns
Mean +- std dev: [/python-3.12/bin/python3] 27.0 ns +- 0.3 ns -> [/cpython/bin/python3] 25.0 ns +- 0.5 ns: 1.08x faster

./bin/python3 -m pyperf timeit -w 3 --compare-to ../python-3.12/bin/python3 -s "
def test():
    return None
" "test()"

/python-3.12/bin/python3: ..................... 27.2 ns +- 1.3 ns
/cpython/bin/python3: ..................... 25.1 ns +- 0.6 ns
Mean +- std dev: [/python-3.12/bin/python3] 27.2 ns +- 1.3 ns -> [/cpython/bin/python3] 25.1 ns +- 0.6 ns: 1.08x faster

From the microbenchmark that there is indeed a ~10% improvement (considering the interference of function calls, I think 10% should be there), which is not very high, but it should be an optimization without adverse effects.

Linked PRs

@penguin-wwy
Copy link
Contributor Author

Execution counts for all instructions in the main branch

Name Count Self Cumulative Miss ratio
LOAD_CONST 4,447,532,233 4.7% 24.4%
RETURN_VALUE 3,241,585,933 3.4% 43.2%

And execution counts for all instructions in my branch

Name Count Self Cumulative Miss ratio
LOAD_CONST 3,477,437,602 3.7% 31.5%
RETURN_VALUE 2,268,077,678 2.4% 47.3%
RETURN_CONST 967,592,289 1.0% 69.3%

@penguin-wwy
Copy link
Contributor Author

penguin-wwy commented Feb 7, 2023

I have executed pyperformance on my server, compare to commit d3e2dd6

python3 -m pyperf compare_to --table --min-speed 5 /python-3.12.0/results.json results.json

+---------------------+---------------------------------------+-----------------------+
| Benchmark           | python-3.12.0/results.json            | results.json          |
+=====================+=======================================+=======================+
| coverage            | 378 ms                                | 360 ms: 1.05x faster  |
+---------------------+---------------------------------------+-----------------------+
| crypto_pyaes        | 87.2 ms                               | 82.5 ms: 1.06x faster |
+---------------------+---------------------------------------+-----------------------+
| logging_silent      | 108 ns                                | 94.3 ns: 1.15x faster |
+---------------------+---------------------------------------+-----------------------+
| pickle_pure_python  | 352 us                                | 330 us: 1.07x faster  |
+---------------------+---------------------------------------+-----------------------+
| regex_v8            | 26.8 ms                               | 23.8 ms: 1.12x faster |
+---------------------+---------------------------------------+-----------------------+
| richards            | 54.6 ms                               | 51.7 ms: 1.06x faster |
+---------------------+---------------------------------------+-----------------------+
| scimark_lu          | 125 ms                                | 133 ms: 1.07x slower  |
+---------------------+---------------------------------------+-----------------------+
| scimark_monte_carlo | 83.3 ms                               | 74.3 ms: 1.12x faster |
+---------------------+---------------------------------------+-----------------------+
| spectral_norm       | 118 ms                                | 112 ms: 1.06x faster  |
+---------------------+---------------------------------------+-----------------------+
| sqlglot_parse       | 1.71 ms                               | 1.61 ms: 1.06x faster |
+---------------------+---------------------------------------+-----------------------+
| sqlglot_transpile   | 2.00 ms                               | 1.89 ms: 1.06x faster |
+---------------------+---------------------------------------+-----------------------+
| unpickle_list       | 4.73 us                               | 5.06 us: 1.07x slower |
+---------------------+---------------------------------------+-----------------------+
| Geometric mean      | (ref)                                 | 1.01x faster          |
+---------------------+---------------------------------------+-----------------------+

Benchmark hidden because not significant (68): 2to3, async_generators, async_tree_none, async_tree_cpu_io_mixed, async_tree_io, async_tree_memoization, chameleon, chaos, bench_mp_pool, bench_thread_pool, coroutines, deepcopy, deepcopy_reduce, deepcopy_memo, deltablue, django_template, docutils, dulwich_log, fannkuch, float, generators, genshi_text, genshi_xml, go, hexiom, html5lib, json_dumps, json_loads, logging_format, logging_simple, mako, mdp, meteor_contest, nbody, nqueens, pathlib, pickle, pickle_dict, pickle_list, pidigits, pprint_safe_repr, pprint_pformat, pyflate, python_startup, python_startup_no_site, raytrace, regex_compile, regex_dna, regex_effbot, scimark_fft, scimark_sor, scimark_sparse_mat_mult, sqlglot_optimize, sqlglot_normalize, sqlite_synth, sympy_expand, sympy_integrate, sympy_sum, sympy_str, telco, tornado_http, unpack_sequence, unpickle, unpickle_pure_python, xml_etree_parse, xml_etree_iterparse, xml_etree_generate, xml_etree_process

Although the performance gain may not be high, the side effects are minimal and should be a positive optimization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
Projects
None yet
Development

No branches or pull requests

2 participants