-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
PGO build broken on Windows #111786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That sounds like a MSVC bug.
Can you try to report the bug to MSVC support? |
Yeah, we'll need a link repro for this, but it should be easy enough to get them to diagnose it. Reporting through dev community is still the best option, even if you work there 😉 Hopefully there's a resolution that doesn't require a specific compiler version. Our infrastructure needs a revamp to handle that situation... |
My plan is to explore configuration/pragma options to see if anything can be worked around (and benchmark the effect of that so we know what the performance impact of not doing certain optimizations on certain functions is). Barring that we may need to revert or redesign #111428. Reporting the bug to MSVC also makes sense, and I plan to do that. But I don't think we can count on it to solve our issue -- there will always be a need to compile on a range of versions for a variety of reasons, including what @zooba already said. |
Reading the description of error C1001, it's clear this is an error in the optimizer. This lead me to a description of MSVC's optimization pragmas as a way to control the optimization of just this one very large function (rather than spending time on anything more global). The two main optimization flags are size ("s") and time ("t"). If you disable them both, you create a new problem, which is that the stack frames get larger, and it exceeds the C stack space during some unit tests ( It would seems that disabling "optimize for speed" would make things slower, however, running on our benchmarking machine, it is about 3% faster (by the HPT method, which is more stable between runs) or 6% faster by the geometric mean method. So it seems like disabling the "t" optimization for just I still plan to report this issue to MSVC. When there is a fix in MSVC, we can add an |
Large functions are often faster when optimised for size, as the instruction cache pressure tends to outweigh anything else. That said, I'm fairly sure that PGO ignores all of those options and uses its own data, and generally it improves upon the I've also found in the past that many of our optimiser crashes have been due to mishandled types between object file (e.g. a function argument is defined as |
How do you reconcile that with @mdboom's actual benchmark results? IIUC the benchmark suite is run with PGO. And I just tried |
The
I spent a little time looking at this -- I didn't see any function signatures change in #111428 that might explain this. I think the more likely culprit is that the already large |
I guess it doesn't ignore the settings? 🤷♂️ Perhaps only for pragmas, but since it doesn't fail during non-PGO builds, clearly PGO is still running in that section of code.
We don't, but we could set a preprocessor variable specific to PGO builds (maybe the compiler sets one anyway?). I'd be more inclined to see how it goes only setting the pragma for non-PGO builds though, and let the profiling decide how to handle the large function. At least theoretically, that ought to be the way it works best, but theoretically the compiler shouldn't crash, so we're well into "make it work with the reality we have right now" territory anyway. |
…r PGO (#111794) In PGO mode, this function caused a compiler error in MSVC. It turns out that optimizing for space only save the day, and is even faster. However, without PGO, this is neither necessary nor slower.
Judging by the timestamp, Mike's reopened this because I asked, so we can set up a better system for handling the workaround. Specifically, I want to have a build flag/property to disable all workarounds so that we can have a test run (probably privately) against the latest compilers to track when they get fixed. I'm going to be offline for a couple of weeks, but will get to implementing or reviewing it after that. |
Without this PGO workaround, 3.13's The error went away when I tried making tier1 and tier2 have their own |
Thanks, @neonene. I will take some of these ideas, measure them, and file a PR. |
I measured this a few different ways and compared
So it seems like the best thing for performance is to separate the opcode variables, leaving everything else the same. |
Suggested by @neonene: python#111786 (comment) This makes Windows about 3% faster on pyperformance benchmarks.
Suggested by @neonene: python#111786 (comment) This makes Windows about 3% faster on pyperformance benchmarks.
This makes Windows about 3% faster on pyperformance benchmarks.
This makes Windows about 3% faster on pyperformance benchmarks.
@mdboom Does your merged fix address this issue entirely? No point adding compatibility infrastructure if we've got a good workaround. |
It appears that Windows PGO builds in faster-cpython/benchmarking-public repo have slowed down with Tier 2 off, after generated cases were sorted (1619f43). If the switch-cases cannot be arranged for working around, a similar performance issue closed with low priority in MSVC's bug tracker would be worth checking, where they suggested The hidden option worked for me in MSVC v.1929 (VS2019) is fine with/without this option. |
…SVC for PGO (python#111794) In PGO mode, this function caused a compiler error in MSVC. It turns out that optimizing for space only save the day, and is even faster. However, without PGO, this is neither necessary nor slower.
…on#112289) This makes Windows about 3% faster on pyperformance benchmarks.
…SVC for PGO (python#111794) In PGO mode, this function caused a compiler error in MSVC. It turns out that optimizing for space only save the day, and is even faster. However, without PGO, this is neither necessary nor slower.
…on#112289) This makes Windows about 3% faster on pyperformance benchmarks.
Bug report
Bug description:
Building with --pgo on Windows seems to be broken on main (853b4b5).
My hunch, since this is failing in
ceval.c
, is that the recent merge of the Tier 1 and Tier 2 interpreters may have made the interpreter loop function too large for MSVC to handle.Build command:
Error:
Complete build log
We don't currently have a buildbot that builds with --pgo on Windows -- probably something to fix as well.
CPython versions tested on:
CPython main branch
Operating systems tested on:
Windows
Linked issue
Microsoft Developer Community MSVC bug report
Linked PRs
The text was updated successfully, but these errors were encountered: