-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[LLVM][DWARF] Make some effort to avoid duplicates in .debug_ranges. #106614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-debuginfo Author: Kyle Huey (khuey) ChangesInlining and zero-cost abstractions tend to produce volumes of debug info with identical ranges. When built with full debugging information (the equivalent of -g2) librustc_driver.so has 2.1 million entries in .debug_ranges. But only 1.1 million of those entries are unique. While in principle all duplicates could be eliminated with a hashtable, checking to see if the new range is exactly identical to the previous range and skipping a new addition if it is is sufficient to eliminate 99.99% of the duplicates. This reduces the size of librustc_driver.so's .debug_ranges section by 35%, or the overall binary size a little more than 1%. Full diff: https://github.com/llvm/llvm-project/pull/106614.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfFile.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfFile.cpp
index eab798c0da7843..cd1279d2021328 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfFile.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfFile.cpp
@@ -121,7 +121,19 @@ void DwarfFile::addScopeLabel(LexicalScope *LS, DbgLabel *Label) {
std::pair<uint32_t, RangeSpanList *>
DwarfFile::addRange(const DwarfCompileUnit &CU, SmallVector<RangeSpan, 2> R) {
- CURangeLists.push_back(
- RangeSpanList{Asm->createTempSymbol("debug_ranges"), &CU, std::move(R)});
+ bool CanReuseLastRange = false;
+
+ if (!CURangeLists.empty()) {
+ auto Last = CURangeLists.back();
+ if (Last.CU == &CU && Last.Ranges == R) {
+ CanReuseLastRange = true;
+ }
+ }
+
+ if (!CanReuseLastRange) {
+ CURangeLists.push_back(RangeSpanList{Asm->createTempSymbol("debug_ranges"),
+ &CU, std::move(R)});
+ }
+
return std::make_pair(CURangeLists.size() - 1, &CURangeLists.back());
}
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfFile.h b/llvm/lib/CodeGen/AsmPrinter/DwarfFile.h
index f76858fc2f36a0..89aadccaac7f9f 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfFile.h
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfFile.h
@@ -37,6 +37,10 @@ class MDNode;
struct RangeSpan {
const MCSymbol *Begin;
const MCSymbol *End;
+
+ bool operator==(const RangeSpan& Other) const {
+ return Begin == Other.Begin && End == Other.End;
+ }
};
struct RangeSpanList {
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
I did consider writing a test for this but it appears that triggering the use of .debug_ranges requires complicated IR. I didn't see any way to force llc to put single contiguous ranges into .debug_ranges, which might be useful for testing things like this. Perhaps @dwblaikie could review this? |
Can you provide a reduced example where this occurs? |
Not too hard to get ranges to occur - cu ranges will happen with a single file with two functions complex with -ffunction-sections (it one of the functions is online, but complex at -O0 do it doesn't get online, it out into another section with an explicit section attribute) To get scope ranges inside a function - try unlocking a function with 2 calls to some external finding, and the caller has one call to an external function. Generate it, then manually reorder the resulting the function calls, so the callers call is between the two inclined from the caller. But none of those scenarios or others I'm aware of price the redundant description it sounds like you've seen/are trying to optimize here... |
This is IR from a pretty trivial Rust program that does some basic stuff with iterators. If you run |
OOh, I'm with you now. I thought you meant repetition within a range list, but I see you mean two range lists with identical entries - because one instruction remaining after inlining is 3 inlines deep, so each of the inlines share the same list of instructions (ie: f1 calls inline f2 calls inline f3 calls f4 - but f2 doesn't have any unique instructions, its own instructions are those in the inlined f3). Yeah, not too hard to reproduce - but requires a small amount of manual IR editing (or requires more quirks in the test case to force instruction reordering - honestly it's probably easier/clearer with the manual IR editing). eg:
Compile that with debug info and optimizations to IR (so all the inlining has already happened), then move the second
And, yes, it'd be nice if the range lists could be shared rather than duplicated. Not sure how I feel about special casing "just the last one" - yeah, I get that it's pretty effective/unlikely to miss much, but doesn't feel great/general/robust. That said, down at this level, we're missing the context that these lists basically won't be reused between functions - we could try to deduplicate only at the function level, so we wouldn't have a map full of range lists that would never be used again. The place to initialize/clear this list would probably be in DwarfCompileUnit::constructSubprogramScopeDIE - then it could be checked/inserted into in addScopeRangeList, maybe. (or maybe we'd want a version of that function that does the map lookup/etc and a version that doesn't (for use with CU ranges only... hmm, actually, I guess there's no reason they couldn't be reused too - if the CU had only one function, but that function used BB sections, then you could end up with a range list at the CU, subprogram, and inlined subroutine ranges that all could be shared)) Hrm :/ I guess then it'd still be a subprogram-local map, and maybe just a special case for the CU that could handle just that one case (does the CU contain a single range-located subprogram, if so, use that range list). So probably not worth trying to make that work - since it'd be such a special case and it likely wouldn't account for much debug info anyway. So I'd guess having a map that's initialized at constructSubprogramScopeDIE and cleared at the end of it would probably be adequate. Open to other ideas, though - perhaps other folks agree it's not worth the effort to build something that general when basically the only way this happens is as we're walking the tree (hmm, which way does this happen - on the way down the tree of scopes, or the way up? does that impact how effective the "last scope matches" technique is? perhaps not... ) You mentioned this technique eliminates "99.99% of the duplicates" - is that an actual number, or a rough estimate? Do you have an example where this technique doesn't catch a duplicate? |
Yes, apologies for the lack of clarity.
I agree that it's unsatisfying on some level but it is very effective. See below.
On the way down, though I don't think it matters.
It is an actual number, although I miscalculated slightly. The correct number is 99.9%. The 2.1M entries I mentioned is for the Stage 1 compiler (rustc built with the last version of Rust). The Stage 2 compiler (rustc built with the Stage 1 compiler) only has 1836763 entries in .debug_ranges. Of those, 1158605 (63%) are unique. Applying this patch reduces the Stage 2 compiler's .debug_ranges to 1159312, of which again 1158605 are unique. So the number of repeated ranges drops from 678158 to 707. So this technique gets roughly 99.9% of the duplicates. Of those 707 repeated ranges, 106 of them appear to be due to the linker removing dead code (i.e. every offset in them is very close to zero). I have not investigated any of the 601 remaining non-unique but legitimate-looking entries to see why they remain. Given how effective this patch is I'd be hesitant to do anything more complicated. |
Could you add a test case? (something like what I outlined/showed above)? & yes, the zeros are from linker deduplication and aren't actually duplicates and production time - there's an lld linker flag that can produce a newer standardized tombstone value (rather than 0) for these gc'd entities. Then you can differentiate those more easily from small-but-valid addresses. ( https://reviews.llvm.org/D83264 ) Could you pick one of the non-low-address examples and see why it's still duplicate? |
I've added a test. I investigated one of the non-low-address duplicate cases and found that it originated in the LLVM components of rustc. Configuring the rustc build to use a clang that has this PR to compile its LLVM components eliminated all of the non-low-address duplicates, leaving only the duplicates created by --gc-sections. |
If you're satisfied with that I should squash and edit the commit message before anything is merged. |
Sure, if you can update the commit message, happy to squash/merge after that. |
Inlining and zero-cost abstractions tend to produce volumes of debug info with identical ranges. When built with full debugging information (the equivalent of -g2) librustc_driver.so has 2.1 million entries in .debug_ranges. But only 1.1 million of those entries are unique. While in principle all duplicates could be eliminated with a hashtable, checking to see if the new range is exactly identical to the previous range and skipping a new addition if it is is sufficient to eliminate the duplicates. This reduces the size of librustc_driver.so's .debug_ranges section by 35%, or the overall binary size a little more than 1%.
7d1e8ef
to
f9dfee1
Compare
Done. |
Thanks! |
Thank you! |
Hi, this test is failing on the AIX bot, could you take a look?
Here is the backtrace:
|
I think the test needs an |
Inlining and zero-cost abstractions tend to produce volumes of debug info with identical ranges. When built with full debugging information (the equivalent of -g2) librustc_driver.so has 2.1 million entries in .debug_ranges. But only 1.1 million of those entries are unique. While in principle all duplicates could be eliminated with a hashtable, checking to see if the new range is exactly identical to the previous range and skipping a new addition if it is is sufficient to eliminate 99.99% of the duplicates. This reduces the size of librustc_driver.so's .debug_ranges section by 35%, or the overall binary size a little more than 1%.