-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[BOLT] --use-aggr-reg-reassign with BOLT on Clang/LLD leads to Segmentation Faults when using these binaries #123809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I could reproduce what looks to be the same issue with very modest flags on other packages, e.g. dbus-broker-git. It seems to get triggered by using
Trace from dbus-broker-git (PGO optimizing stage):
|
@llvm/issue-subscribers-backend-x86 Author: None (ms178)
**Description:**
I am encountering a segmentation fault while compiling the Environment:
Steps to Reproduce:
Expected Behavior: The Actual Behavior: The compilation crashes with a segmentation fault during the optimization of Stack trace:
|
Further investigation with an assertion build from the same LLVM revision suggest that the BOLT optimizations performed on that build might have caused or triggered the issue. With an assertion build from the same revision but minus the BOLT optimization stage that I usually perform on top, I cannot reproduce the reported issue. However, if I perform the BOLT optimization on the assertion build, I get issues, albeit with a different trace this time that also reproduce with O1. With the Kernel:
PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT: fixdep-d6ca3c.c.txt With dbus-broker-git:
PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT: c-dvar-80b4b0.c.txt As I've used some BOLT patches that are still undergoing review might be a factor here, too. I'll try again with a more conservative BOLT configuration. |
@aaupov This is the other issue I was diagnosing today. I am now confident that either BOLT or the extra BOLT patches under review that I am carrying around (https://github.com/ms178/archpkgbuilds/blob/main/toolchain-experimental/llvm-git/fixes.patch) have broken the Clang/LLD binaries as the LTO+PGOed binaries of the same LLVM revision work fine so far and cannot reproduce the issues reported here. Experimenting with different BOLT configuration options didn't help so far. Are you aware of changes in BOLT during the past two weeks that might have broken the BOLTed Clang/LLD binaries that I am seeing these stack traces with? For reference, this is the relevant part of my BOLT script that produced working binaries as of two weeks ago:
|
I would try to reduce BOLT aggressiveness first, e.g. leave only the following:
If you have interest in improving BOLT, it would be great if you could narrow down the issue to a specific BOLT pass and binary function, I can help with that. |
Reducing the BOLT options indeed seems to have worked to produce working binaries.
The stats also look very similar. Old Stats:
Stats with reduced config:
|
@aaupov I've now tested several variants, and I could successfully narrow the problem down to the I've also tested that this one produced working binaries:
While this one doesn't:
Getting rid of |
- refine BOLT options according to a BOLT dev comment in llvm/llvm-project#123809 (comment) and my debugging efforts in that issue that revealed a problem with --use-aggr-reg-reassign on newer LLVM20git revisions which lead to broken Clang/LLD binaries
It looks like you found this set of options from my llvm script? I've found this before, |
Heh, you've caught me. :) Indeed I took inspiration by your BOLT options a couple of weeks ago to optimize my build scripts further. By the way, I've BOLTed several other packages for Arch compatible distros during the past month (PKGBUILDS are available in my repo), I hope these will provide an easy to use way for other people to catch these BOLT bugs and improve it further for everyone. |
@ms178 IIUC the original crash cannot be reproduced any longer? And this seems to be BOLT mis-compile? In that case, I think we should update the title + labels |
@fhahn It still can be reproduced, but only when my Clang + LLD binaries have been BOLTed with the mentioned BOLT configuration options. I cannot say what the root cause is, if BOLT exposes a bug or if the defect is on the BOLT side causing a mis-compile. If you think that a BOLT bug is more likely, I can change the title of course. |
If it is only happening with BOLT, this seems to indicate that it is a BOLT miscompile, causing the crash and the first step should be checking if the changes BOLT makes are correct. |
I've updated the title and the top post with relevant information that was made in the meantime. As I lack access, I'd like to kindly ask someone to update the labels. |
@llvm/issue-subscribers-bolt Author: None (ms178)
**Description:**
I am encountering compiler errors while compiling various projects with BOLTed Clang/LLD binaries that were built with the Environment:
Steps to Reproduce:
Expected Behavior: The Actual Behavior: The compilation crashes with a segmentation fault during the optimization of Stack trace:
|
Description:
I am encountering compiler errors while compiling various projects with BOLTed Clang/LLD binaries that were built with the
--use-aggr-reg-reassign
option, e.g. in theasn1_compiler
component of the Linux kernel 6.13. There, the faulty binaries crash during the loop vectorization optimization pass. Without that option, a BOLTed Clang/LLD with the same revision worked fine for the same task.Environment:
Steps to Reproduce:
Expected Behavior:
The
asn1_compiler
should compile successfully without any errors.Actual Behavior:
The compilation crashes with a segmentation fault during the optimization of
scripts/asn1_compiler.c
.Stack trace:
asn1_compiler-82f6f2.c.txt
asn1_compiler-82f6f2.sh.txt
The text was updated successfully, but these errors were encountered: