-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[do not merge] Evaluate the hot/cold splitting pass #21016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(cherry picked from commit a5e427732d08c35bc2a67d10f8d5140475a02e01)
apple/swift-llvm#127 |
@swift-ci Please clean smoke test OS X platform |
apple/swift-llvm#127 |
There are some decent performance improvements in a few benchmarks (NopDeinit is 1.31x faster at -O) mixed with a few regressions (Walsh is 0.85x as fast). As mentioned in the PR description, using a modified linker which co-locates cold/outlined symbols should give a significant improvement here. Hot/cold splitting seems to have a negative effect on code size, especially with integer-heavy benchmarks which (presumably) contain many outlinable traps. Tweaking the outlining code size threshold should improve the results. If we ever want this optimization in swift, we might consider disabling it in -Osize. |
Build comment file:Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
Code size: -swiftlibs
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
apple/swift-llvm#127 |
^ I've kicked off another smoke-benchmark run with the outlining threshold bumped up. |
Build comment file:Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
Code size: -swiftlibs
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@vedantk This is awesome! The code size hit might be worth it even at -Osize if it gives a good resident set win when paired with a cooperative linker. Part of the point of -Osize is to reduce memory usage by reducing code size, after all, and this more directly addresses that issue. Maybe there's a better way we could emit overflow traps to make them more splitting-friendly too. |
@jckarter thanks for taking a look! I haven't taken a close look yet at how Swift emits overflow traps so I'm not sure whether that would need to change. I should point out that there are two more issues with the experiment done in this PR: 1) the splitting pass is scheduled after inlining, and 2) it doesn't look like SimplifyCFG has a chance to run afterwards and clean up some of the mess CodeExtractor leaves behind. I think it'd be worth repeating the experiment with the pipeline fixed to get more realistic numbers. |
Closing, as the sanity check I originally wanted is done. |
This PR is a sanity-check for hot/cold splitting in the swift compiler. It's not meant to be merged. The goal is to get a rough idea of the effectiveness of the pass by gathering some basic performance numbers.
Caveats: