-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[lld-macho] Fix branch extension thunk estimation logic #120529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
fe6369a
[lld-macho] Fix branch extension thunk estimation logic
bc5e342
uint32 => uint64 + extra comments
43949e0
Undo fixing unrelated formatting
e1408de
Fix typo + add better comments to code.
ec0c7ea
std::partition_point => llvm::partition_point
f8782e4
Address Feedback Nr.2
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, the
thunks
used here are derived from thethunkMap
(see 1), andthunkMap
is iterated above (lines 171-179). One should be able to figure out this value during the previous iteration and avoid thepartition_point
call.Something like:
Because the dependency on
isecVA
, the code needs to move under the calculation ofisecVA
(more or less where the new code is added in this PR).(It also looks like we might be double counting in
maxPotentialThunks
andexistingForwardThunks
, but there might be some not-easy-to-see dependency in howthunkMap
is used and whenestimateStubsInRangeVA
is called that avoids that situation).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true, but i think this might be less performant. The for loop iterates all thunks which there can be many of (100k+). That means adding a branch that executes 100k times.
maxPotentialThunks
is log(n) complexity so it for ~100k input thunks it will have <20 iterations and a few hundred instructions executed.Generally we favor more performant code for LLD - even if here it probably won't really be measurable.
Or do we want this for easier code readability / understandability ?
This is basically the issue to be fixed.
maxPotentialThunks
is intended and appears to contain all the possible thunks and adding inexistingForwardThunks
would appear to be double counting . But that's exactly the original bug -maxPotentialThunks
does not include all the thunks so we need to count all the thunks by addingexistingForwardThunks
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not have an intuition for the total number of entries in
thunkMap
. If we are talking about 100k or those numbers, doing only ~16 comparisons is probably going to always be quicker.My comment about double counting is because of the name
maxPotentialThunks
and the waythunkMap
seems to be copied progressively intothunks
, but never cleaned up. I don't pretend to understand the flow of the code (it is difficult to figure out what's going on and how things will happen during runtime), but the namemaxPotentialThunks
might not be the best if it was undercounting (originally). IIUC the code callsestimateStubsInRangeVA
only once duringfinalize
(which I think only happens once). At the pointestimateStubsInRangeVA
is called, part ofthunkMap
has already been pushed intothunks
. The more we overcount, the higherstubsInRangeVA
will be, and it might be that we are creating thunks for the end of the text section when a simple jump will had be enough. I don't know how bad is that size increase in the final binary when we are overcounting. It seems to me that we were undercounting whenti.callSitesUsed == ti.callSiteCount
(I imaginecallSitesUsed
cannot be higher thancallSiteCount
). It might be possible to have a better estimation during thethunksMap
processing, for example changing theif (ti.isec->getVA() <= isecVA)
I was proposing above toelse if (ti.isec->getVA() <= isecVA)
(which I think was when the original code was undercounting).Again, it is complicated code, I probably don't understand it correctly, and there's a couple of things that might make overcounting not as important of not undercounting. Feel free to ignore my feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Indeed this is very tricky code. I should have clarified better - the thing that is being undercounted is the count of thunks that are needed to adjust the VA.
maxPotentialThunks
is correct, because it's counting the max thunks that can be created in the future - which it does correctly. The issue is that the VA also needs to be adjusted by the existing thunks that haven't been taken into consideration.The implicit problematic assumption being that
maxPotentialThunks
includes the existing thunks, but it actually doesn't.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand your answer correctly, then when
if (ti.callSitesUsed < ti.callSiteCount)
is not taken, it is the only case where the previous code was undercounting, so adding the rightelse if
in there (eitherti.isec->getVA() <= isecVA
orti.isec->getVA() > isecVA
, which one I am not sure) should only add to the estimation those thunks that were forgotten before. Yes, the new code will be doing more checks that withpartition_point
(specially because for the last input section I would expect many thunks to matchti.callSitesUsed == ti.callSiteCount
), but at the same time it will reduce the overcounting, which should improve the size of the final binary (by not unnecessarily creating more stubs becausestubsInRangeVA
is too high).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm not quite. From debugging there was no opportunity to actually improve the estimation to work in all cases (without changing the fundamental algorithm). The already placed thunks were just being ignored - basically forgotten to be taken into consideration. I am not sure if your proposal will work - I don't think it will for all cases - there are a few more variables in play.
I do think that we can improve the algorithm a bit - but I this is the direct fix for the current bug.
I am hesitant to try to improve the underlying algorithm right now as that can lead to more issues in unknown scenarios.