Skip to content

use -flto=thin for clang-cl on Windows #131035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chris-eibl opened this issue Mar 10, 2025 · 2 comments
Closed

use -flto=thin for clang-cl on Windows #131035

chris-eibl opened this issue Mar 10, 2025 · 2 comments
Labels
build The build process and cross-build OS-windows performance Performance or resource usage type-feature A feature request or enhancement

Comments

@chris-eibl
Copy link
Member

chris-eibl commented Mar 10, 2025

This started off as a build time analysis (#130090 (comment)), but since I now have the infrastructure, I tried -flto=thin, too:

  • faster in building 520.6 vs 651.2 seconds
  • is neutral on the pyperformance benchmarks
  • would bring us in sync with Linux, because there CONFIGURE_CFLAGS_NODIST and CONFIGURE_LDFLAGS_NOLTO both use -flto=thin when I configure for clang in WSL Ubuntu-24.04. See also the discussion why not to use full -flto in Revert to default fullLTO on Clang #130048
Benchmark clang.pgo.20.1.0-rc2 clang.pgo.thin.20.1.0-rc2
Geometric mean (ref) 1.00x faster
Detailed pybenchmark results

Benchmark clang.pgo.20.1.0-rc2 clang.pgo.thin.20.1.0-rc2
float 95.0 ms 89.7 ms: 1.06x faster
json_loads 29.8 us 28.6 us: 1.04x faster
mdp 2.86 sec 2.77 sec: 1.03x faster
html5lib 68.3 ms 66.2 ms: 1.03x faster
async_tree_none_tg 330 ms 320 ms: 1.03x faster
pyflate 518 ms 505 ms: 1.03x faster
sqlite_synth 3.21 us 3.13 us: 1.03x faster
pidigits 228 ms 223 ms: 1.02x faster
bench_mp_pool 168 ms 165 ms: 1.02x faster
async_tree_eager_io 742 ms 727 ms: 1.02x faster
generators 34.5 ms 33.8 ms: 1.02x faster
comprehensions 18.3 us 17.9 us: 1.02x faster
async_tree_cpu_io_mixed 641 ms 629 ms: 1.02x faster
scimark_sparse_mat_mult 4.51 ms 4.43 ms: 1.02x faster
async_tree_memoization 425 ms 417 ms: 1.02x faster
sympy_expand 538 ms 529 ms: 1.02x faster
unpack_sequence 57.0 ns 56.0 ns: 1.02x faster
regex_dna 209 ms 205 ms: 1.02x faster
async_generators 465 ms 458 ms: 1.02x faster
scimark_sor 140 ms 137 ms: 1.02x faster
sympy_str 319 ms 314 ms: 1.02x faster
async_tree_io_tg 751 ms 740 ms: 1.01x faster
regex_effbot 3.14 ms 3.10 ms: 1.01x faster
async_tree_eager_tg 272 ms 268 ms: 1.01x faster
pickle_dict 27.3 us 27.0 us: 1.01x faster
async_tree_eager_memoization_tg 363 ms 359 ms: 1.01x faster
sympy_integrate 22.5 ms 22.2 ms: 1.01x faster
sympy_sum 181 ms 179 ms: 1.01x faster
2to3 390 ms 386 ms: 1.01x faster
hexiom 6.68 ms 6.61 ms: 1.01x faster
docutils 3.03 sec 3.00 sec: 1.01x faster
sqlglot_normalize 121 ms 120 ms: 1.01x faster
async_tree_memoization_tg 392 ms 389 ms: 1.01x faster
async_tree_cpu_io_mixed_tg 614 ms 609 ms: 1.01x faster
tomli_loads 2.20 sec 2.18 sec: 1.01x faster
spectral_norm 102 ms 101 ms: 1.01x faster
python_startup_no_site 34.4 ms 34.2 ms: 1.01x faster
genshi_text 24.6 ms 24.5 ms: 1.01x faster
dulwich_log 119 ms 118 ms: 1.00x faster
go 128 ms 128 ms: 1.00x faster
deltablue 3.62 ms 3.63 ms: 1.00x slower
unpickle_pure_python 247 us 248 us: 1.00x slower
xml_etree_generate 107 ms 107 ms: 1.01x slower
django_template 39.2 ms 39.4 ms: 1.01x slower
coroutines 24.8 ms 25.0 ms: 1.01x slower
mako 13.3 ms 13.5 ms: 1.01x slower
unpickle 15.9 us 16.1 us: 1.01x slower
nbody 119 ms 121 ms: 1.01x slower
fannkuch 465 ms 472 ms: 1.01x slower
crypto_pyaes 81.3 ms 82.6 ms: 1.02x slower
json_dumps 11.5 ms 11.7 ms: 1.02x slower
deepcopy 285 us 291 us: 1.02x slower
pprint_safe_repr 858 ms 876 ms: 1.02x slower
xml_etree_iterparse 136 ms 139 ms: 1.02x slower
gc_traversal 5.03 ms 5.14 ms: 1.02x slower
meteor_contest 115 ms 117 ms: 1.02x slower
deepcopy_memo 33.8 us 34.7 us: 1.03x slower
richards_super 51.1 ms 52.6 ms: 1.03x slower
scimark_fft 327 ms 337 ms: 1.03x slower
richards 44.9 ms 46.3 ms: 1.03x slower
pickle_list 4.83 us 4.99 us: 1.03x slower
deepcopy_reduce 2.93 us 3.03 us: 1.03x slower
pprint_pformat 1.74 sec 1.80 sec: 1.03x slower
logging_simple 10.9 us 11.4 us: 1.05x slower
logging_format 12.1 us 12.6 us: 1.05x slower
xml_etree_parse 197 ms 208 ms: 1.05x slower
Geometric mean (ref) 1.00x faster

pgo_clang_20.1.0-rc2 pgo_clang_thin_20.1.0-rc2
pginstr 297.2 219.3
pgo 70.0 69.0
kill 1.2 0.5
pgupd 282.8 231.7
total time 651.2 520.6
Details pginstrument

pgo_clang_20.1.0-rc2 pgo_clang_thin_20.1.0-rc2
_freeze_module 38.5 40.0
python314 141.5 81.3
pyexpat 52.7 3.9
_elementtree 51.8 5.3
sqlite3 46.0 42.4
liblzma 18.2 16.5
_decimal 12.4 7.7
_testcapi 8.3 7.1
_bz2 7.0 4.9
_ctypes 6.9 7.5
_testlimitedcapi 4.9 4.3
_wmi 4.5 3.0
_overlapped 4.5 3.2
_asyncio 4.0 5.2
_lzma 3.8 1.8
_ssl 3.7 5.5
_ctypes_test 3.7 3.4
_multiprocessing 3.5 2.7
_sqlite3 3.4 2.8
venvwlauncher 3.3 2.7
_zoneinfo 3.1 3.4
unicodedata 2.7 3.0
pyshellext 2.7 2.6
pyw 2.7 2.7
py 2.6 2.5
_socket 2.4 3.7
_testinternalcapi 2.4 2.2
_tkinter 2.2 4.1
_testclinic 2.0 1.9
_hashlib 1.8 3.1
select 1.8 2.2
venvlauncher 1.8 1.7
winsound 1.7 3.3
_uuid 1.6 3.2
_queue 1.6 2.3
_testembed 1.5 1.5
_testbuffer 1.4 1.3
pythonw 1.1 1.1
_testconsole 1.1 1.1
_testmultiphase 1.0 1.0
_testsinglephase 1.0 1.0
python 1.0 0.9
_testclinic_limited 0.9 0.9
_testimportmultiple 0.9 0.9
python3 0.5 0.5
total 465.8 303.3

Details pgupdate

pgo_clang_20.1.0-rc2 pgo_clang_thin_20.1.0-rc2
_freeze_module 38.0 39.5
python314 141.9 95.4
sqlite3 44.4 42.9
liblzma 17.3 16.5
_decimal 11.2 8.7
_testcapi 8.6 7.3
_ctypes 8.0 7.2
_bz2 7.8 5.5
_ssl 5.2 5.6
_testlimitedcapi 5.0 4.2
pyexpat 4.6 3.6
_asyncio 4.5 4.6
_socket 4.3 3.5
_tkinter 4.0 4.2
_ctypes_test 3.7 3.4
_overlapped 3.5 3.7
_elementtree 3.5 4.5
_wmi 3.5 3.1
_zoneinfo 3.2 3.2
_lzma 3.2 1.9
unicodedata 3.2 3.0
_sqlite3 3.1 2.7
_hashlib 3.1 3.3
venvwlauncher 3.1 3.0
_multiprocessing 2.8 2.6
pyshellext 2.7 2.6
pyw 2.6 2.6
_uuid 2.6 2.8
py 2.6 2.7
_testinternalcapi 2.4 2.2
_testclinic 2.0 1.9
_queue 1.9 2.2
winsound 1.8 3.0
venvlauncher 1.7 1.5
select 1.6 2.0
_testembed 1.5 1.4
_testbuffer 1.4 1.3
_testconsole 1.1 1.0
pythonw 1.1 1.1
_testmultiphase 1.0 1.1
_testsinglephase 1.0 1.0
python 1.0 0.9
_testclinic_limited 0.9 0.9
_testimportmultiple 0.9 0.9
python3 0.5 0.5
total 372.9 316.8

Linked PRs

@chris-eibl
Copy link
Member Author

@zooba: yippie, that was a fast one 🚀

PR is merged. Shall I close the issue (or is this a core-dev task)? I see, I get the option to "close as completed".

@Fidget-Spinner
Copy link
Member

You can close your own issues whenever you think it's done.

@picnixz picnixz added performance Performance or resource usage OS-windows build The build process and cross-build type-feature A feature request or enhancement labels Mar 12, 2025
seehwan pushed a commit to seehwan/cpython that referenced this issue Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build The build process and cross-build OS-windows performance Performance or resource usage type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants