Skip to content

[LLVM][DWARF] Make some effort to avoid duplicates in .debug_ranges. #106614

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 4, 2024

Conversation

khuey
Copy link
Contributor

@khuey khuey commented Aug 29, 2024

Inlining and zero-cost abstractions tend to produce volumes of debug info with identical ranges. When built with full debugging information (the equivalent of -g2) librustc_driver.so has 2.1 million entries in .debug_ranges. But only 1.1 million of those entries are unique. While in principle all duplicates could be eliminated with a hashtable, checking to see if the new range is exactly identical to the previous range and skipping a new addition if it is is sufficient to eliminate 99.99% of the duplicates. This reduces the size of librustc_driver.so's .debug_ranges section by 35%, or the overall binary size a little more than 1%.

@llvmbot
Copy link
Member

llvmbot commented Aug 29, 2024

@llvm/pr-subscribers-debuginfo

Author: Kyle Huey (khuey)

Changes

Inlining and zero-cost abstractions tend to produce volumes of debug info with identical ranges. When built with full debugging information (the equivalent of -g2) librustc_driver.so has 2.1 million entries in .debug_ranges. But only 1.1 million of those entries are unique. While in principle all duplicates could be eliminated with a hashtable, checking to see if the new range is exactly identical to the previous range and skipping a new addition if it is is sufficient to eliminate 99.99% of the duplicates. This reduces the size of librustc_driver.so's .debug_ranges section by 35%, or the overall binary size a little more than 1%.


Full diff: https://github.com/llvm/llvm-project/pull/106614.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/AsmPrinter/DwarfFile.cpp (+14-2)
  • (modified) llvm/lib/CodeGen/AsmPrinter/DwarfFile.h (+4)
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfFile.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfFile.cpp
index eab798c0da7843..cd1279d2021328 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfFile.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfFile.cpp
@@ -121,7 +121,19 @@ void DwarfFile::addScopeLabel(LexicalScope *LS, DbgLabel *Label) {
 
 std::pair<uint32_t, RangeSpanList *>
 DwarfFile::addRange(const DwarfCompileUnit &CU, SmallVector<RangeSpan, 2> R) {
-  CURangeLists.push_back(
-      RangeSpanList{Asm->createTempSymbol("debug_ranges"), &CU, std::move(R)});
+  bool CanReuseLastRange = false;
+
+  if (!CURangeLists.empty()) {
+    auto Last = CURangeLists.back();
+    if (Last.CU == &CU && Last.Ranges == R) {
+      CanReuseLastRange = true;
+    }
+  }
+
+  if (!CanReuseLastRange) {
+    CURangeLists.push_back(RangeSpanList{Asm->createTempSymbol("debug_ranges"),
+                                         &CU, std::move(R)});
+  }
+
   return std::make_pair(CURangeLists.size() - 1, &CURangeLists.back());
 }
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfFile.h b/llvm/lib/CodeGen/AsmPrinter/DwarfFile.h
index f76858fc2f36a0..89aadccaac7f9f 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfFile.h
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfFile.h
@@ -37,6 +37,10 @@ class MDNode;
 struct RangeSpan {
   const MCSymbol *Begin;
   const MCSymbol *End;
+
+  bool operator==(const RangeSpan& Other) const {
+    return Begin == Other.Begin && End == Other.End;
+  }
 };
 
 struct RangeSpanList {

Copy link

github-actions bot commented Aug 29, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@khuey
Copy link
Contributor Author

khuey commented Aug 29, 2024

I did consider writing a test for this but it appears that triggering the use of .debug_ranges requires complicated IR. I didn't see any way to force llc to put single contiguous ranges into .debug_ranges, which might be useful for testing things like this.

Perhaps @dwblaikie could review this?

@dwblaikie
Copy link
Collaborator

Can you provide a reduced example where this occurs?

@dwblaikie
Copy link
Collaborator

Not too hard to get ranges to occur - cu ranges will happen with a single file with two functions complex with -ffunction-sections (it one of the functions is online, but complex at -O0 do it doesn't get online, it out into another section with an explicit section attribute)

To get scope ranges inside a function - try unlocking a function with 2 calls to some external finding, and the caller has one call to an external function. Generate it, then manually reorder the resulting the function calls, so the callers call is between the two inclined from the caller.

But none of those scenarios or others I'm aware of price the redundant description it sounds like you've seen/are trying to optimize here...

@khuey
Copy link
Contributor Author

khuey commented Aug 29, 2024

This is IR from a pretty trivial Rust program that does some basic stuff with iterators.

ir.txt

If you run llc -O2 on this without this patch you'll see a number of repeated debug ranges for successive inlines (e.g. .Ldebug_ranges3-9 are all the same).

@dwblaikie
Copy link
Collaborator

OOh, I'm with you now. I thought you meant repetition within a range list, but I see you mean two range lists with identical entries - because one instruction remaining after inlining is 3 inlines deep, so each of the inlines share the same list of instructions (ie: f1 calls inline f2 calls inline f3 calls f4 - but f2 doesn't have any unique instructions, its own instructions are those in the inlined f3).

Yeah, not too hard to reproduce - but requires a small amount of manual IR editing (or requires more quirks in the test case to force instruction reordering - honestly it's probably easier/clearer with the manual IR editing).

eg:

void f1();
inline void f2() {
  f1();
  f1();
}
inline void f3() {
  f2();
}
void f4() {
  f3();
  f1();
}

Compile that with debug info and optimizations to IR (so all the inlining has already happened), then move the second f1 call after the 3rd, causing the inlining to be split into ranges.
The resulting DWARF looks something like this:

0x00000031:   DW_TAG_subprogram
                DW_AT_low_pc    (0x0000000000000000)
                DW_AT_high_pc   (0x0000000000000011)
                DW_AT_frame_base        (DW_OP_reg7 RSP)
                DW_AT_call_all_calls    (true)
                DW_AT_linkage_name      ("_Z2f4v")
                DW_AT_name      ("f4")
                DW_AT_decl_file ("/usr/local/google/home/blaikie/dev/scratch/test.cpp")
                DW_AT_decl_line (9)
                DW_AT_external  (true)

0x0000003d:     DW_TAG_inlined_subroutine
                  DW_AT_abstract_origin (0x0000002c "_Z2f3v")
                  DW_AT_ranges  (indexed (0x0) rangelist = 0x00000014
                     [0x0000000000000001, 0x0000000000000006)
                     [0x000000000000000b, 0x0000000000000011))
                  DW_AT_call_file       ("/usr/local/google/home/blaikie/dev/scratch/test.cpp")
                  DW_AT_call_line       (10)
                  DW_AT_call_column     (3)

0x00000046:       DW_TAG_inlined_subroutine
                    DW_AT_abstract_origin       (0x00000027 "_Z2f2v")
                    DW_AT_ranges        (indexed (0x1) rangelist = 0x0000001b
                       [0x0000000000000001, 0x0000000000000006)
                       [0x000000000000000b, 0x0000000000000011))
                    DW_AT_call_file     ("/usr/local/google/home/blaikie/dev/scratch/test.cpp")
                    DW_AT_call_line     (7)
                    DW_AT_call_column   (3)

0x0000004f:       NULL

And, yes, it'd be nice if the range lists could be shared rather than duplicated.

Not sure how I feel about special casing "just the last one" - yeah, I get that it's pretty effective/unlikely to miss much, but doesn't feel great/general/robust.

That said, down at this level, we're missing the context that these lists basically won't be reused between functions - we could try to deduplicate only at the function level, so we wouldn't have a map full of range lists that would never be used again.

The place to initialize/clear this list would probably be in DwarfCompileUnit::constructSubprogramScopeDIE - then it could be checked/inserted into in addScopeRangeList, maybe. (or maybe we'd want a version of that function that does the map lookup/etc and a version that doesn't (for use with CU ranges only... hmm, actually, I guess there's no reason they couldn't be reused too - if the CU had only one function, but that function used BB sections, then you could end up with a range list at the CU, subprogram, and inlined subroutine ranges that all could be shared))

Hrm :/ I guess then it'd still be a subprogram-local map, and maybe just a special case for the CU that could handle just that one case (does the CU contain a single range-located subprogram, if so, use that range list). So probably not worth trying to make that work - since it'd be such a special case and it likely wouldn't account for much debug info anyway.

So I'd guess having a map that's initialized at constructSubprogramScopeDIE and cleared at the end of it would probably be adequate.

Open to other ideas, though - perhaps other folks agree it's not worth the effort to build something that general when basically the only way this happens is as we're walking the tree (hmm, which way does this happen - on the way down the tree of scopes, or the way up? does that impact how effective the "last scope matches" technique is? perhaps not... )

You mentioned this technique eliminates "99.99% of the duplicates" - is that an actual number, or a rough estimate? Do you have an example where this technique doesn't catch a duplicate?

@khuey
Copy link
Contributor Author

khuey commented Aug 30, 2024

OOh, I'm with you now. I thought you meant repetition within a range list, but I see you mean two range lists with identical entries

Yes, apologies for the lack of clarity.

Not sure how I feel about special casing "just the last one" - yeah, I get that it's pretty effective/unlikely to miss much, but doesn't feel great/general/robust.

I agree that it's unsatisfying on some level but it is very effective. See below.

hmm, which way does this happen - on the way down the tree of scopes, or the way up? does that impact how effective the "last scope matches" technique is? perhaps not...

On the way down, though I don't think it matters.

You mentioned this technique eliminates "99.99% of the duplicates" - is that an actual number, or a rough estimate? Do you have an example where this technique doesn't catch a duplicate?

It is an actual number, although I miscalculated slightly. The correct number is 99.9%. The 2.1M entries I mentioned is for the Stage 1 compiler (rustc built with the last version of Rust). The Stage 2 compiler (rustc built with the Stage 1 compiler) only has 1836763 entries in .debug_ranges. Of those, 1158605 (63%) are unique. Applying this patch reduces the Stage 2 compiler's .debug_ranges to 1159312, of which again 1158605 are unique. So the number of repeated ranges drops from 678158 to 707. So this technique gets roughly 99.9% of the duplicates.

Of those 707 repeated ranges, 106 of them appear to be due to the linker removing dead code (i.e. every offset in them is very close to zero). I have not investigated any of the 601 remaining non-unique but legitimate-looking entries to see why they remain.

Given how effective this patch is I'd be hesitant to do anything more complicated.

@dwblaikie
Copy link
Collaborator

Could you add a test case? (something like what I outlined/showed above)?

& yes, the zeros are from linker deduplication and aren't actually duplicates and production time - there's an lld linker flag that can produce a newer standardized tombstone value (rather than 0) for these gc'd entities. Then you can differentiate those more easily from small-but-valid addresses. ( https://reviews.llvm.org/D83264 )

Could you pick one of the non-low-address examples and see why it's still duplicate?

@khuey
Copy link
Contributor Author

khuey commented Sep 4, 2024

I've added a test.

I investigated one of the non-low-address duplicate cases and found that it originated in the LLVM components of rustc. Configuring the rustc build to use a clang that has this PR to compile its LLVM components eliminated all of the non-low-address duplicates, leaving only the duplicates created by --gc-sections.

@khuey
Copy link
Contributor Author

khuey commented Sep 4, 2024

If you're satisfied with that I should squash and edit the commit message before anything is merged.

@dwblaikie
Copy link
Collaborator

Sure, if you can update the commit message, happy to squash/merge after that.

Inlining and zero-cost abstractions tend to produce volumes of debug info with
identical ranges. When built with full debugging information (the equivalent of
-g2) librustc_driver.so has 2.1 million entries in .debug_ranges. But only 1.1
million of those entries are unique. While in principle all duplicates could be
eliminated with a hashtable, checking to see if the new range is exactly
identical to the previous range and skipping a new addition if it is is
sufficient to eliminate the duplicates. This reduces the size of
librustc_driver.so's .debug_ranges section by 35%, or the overall binary size a
little more than 1%.
@khuey khuey force-pushed the deduplicate-debug-ranges branch from 7d1e8ef to f9dfee1 Compare September 4, 2024 17:08
@khuey
Copy link
Contributor Author

khuey commented Sep 4, 2024

Done.

@dwblaikie dwblaikie merged commit a43137c into llvm:main Sep 4, 2024
6 of 7 checks passed
@khuey
Copy link
Contributor Author

khuey commented Sep 4, 2024

Thanks!

@dwblaikie
Copy link
Collaborator

Thanks!

Thank you!

@jakeegan
Copy link
Member

jakeegan commented Sep 5, 2024

Hi, this test is failing on the AIX bot, could you take a look?

Assertion failed: Section && "Cannot switch to a null section!", file  /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/lib/MC/MCStreamer.cpp, line 1260, virtual void llvm::MCStreamer::switchSection(MCSection *, uint32_t)()

https://lab.llvm.org/buildbot/#/builders/64/builds/862/steps/6/logs/FAIL__LLVM__debug-ranges-duplication_ll

Here is the backtrace:

#0  0x09000000005cdfe4 in pthread_kill () from /usr/lib/libpthreads.a(shr_xpg5_64.o)
#1  0x09000000005cd808 in _p_raise () from /usr/lib/libpthreads.a(shr_xpg5_64.o)
#2  0x0900000000040e8c in raise () from /usr/lib/libc.a(shr_64.o)
#3  0x090000000005efbc in abort () from /usr/lib/libc.a(shr_64.o)
#4  0x09000000000f3b3c in __assert_c99 () from /usr/lib/libc.a(shr_64.o)
#5  0x000000010059c5c0 in llvm::MCStreamer::switchSection(llvm::MCSection*, unsigned int) ()
#6  0x000000010749d3b4 in llvm::DwarfDebug::emitDebugRangesImpl (this=0x12108ce10, Holder=..., Section=<optimized out>)
    at llvm-project/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:3160
#7  0x000000010748bb2c in llvm::DwarfDebug::emitDebugRanges (this=0x12108ce10)
    at llvm-project/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:3178
#8  llvm::DwarfDebug::endModule (this=0x12108ce10) at llvm-project/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:1456
#9  0x00000001074c5010 in llvm::AsmPrinter::doFinalization (this=0x121081af0, M=...)
    at llvm-project/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:2449
#10 0x000000010759c304 in (anonymous namespace)::PPCAIXAsmPrinter::doFinalization (this=0x121081af0, M=...)
    at llvm-project/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp:3208
#11 0x00000001001da1a4 in llvm::FPPassManager::doFinalization (this=0x121074a70, M=...)
    at llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1472
#12 0x0000000106e1d654 in (anonymous namespace)::MPPassManager::runOnModule (this=0x121065790, M=...)
    at llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1559
#13 llvm::legacy::PassManagerImpl::run (this=<optimized out>, M=...)
    at llvm-project/llvm/lib/IR/LegacyPassManager.cpp:541
#14 0x0000000106e1ced4 in llvm::legacy::PassManager::run (this=<optimized out>, M=...)
    at llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1650
#15 0x00000001061f826c in (anonymous namespace)::EmitAssemblyHelper::RunCodegenPipeline (this=<optimized out>, Action=<optimized out>, 
    OS=..., DwoOS=...) at llvm-project/clang/lib/CodeGen/BackendUtil.cpp:1162
#16 (anonymous namespace)::EmitAssemblyHelper::EmitAssembly (this=<optimized out>, Action=<optimized out>, OS=..., BC=<optimized out>) at llvm-project/clang/lib/CodeGen/BackendUtil.cpp:1185
#17 clang::EmitBackendOutput (Diags=..., HeaderOpts=..., CGOpts=..., TOpts=..., LOpts=..., TDesc=..., M=0x12104e330, 
    Action=<optimized out>, VFS=..., OS=..., BC=<optimized out>)
    at llvm-project/clang/lib/CodeGen/BackendUtil.cpp:1347
#18 0x000000010855c624 in clang::CodeGenAction::ExecuteAction (this=<optimized out>)
    at llvm-project/clang/lib/CodeGen/CodeGenAction.cpp:1222
#19 0x00000001058c3348 in clang::FrontendAction::Execute (this=0x121046850)
    at llvm-project/clang/lib/Frontend/FrontendAction.cpp:1078
#20 0x00000001057b22c8 in clang::CompilerInstance::ExecuteAction (this=0x121042cb0, Act=...)
    at llvm-project/clang/lib/Frontend/CompilerInstance.cpp:1061
#21 0x000000010788af50 in clang::ExecuteCompilerInvocation (Clang=0x121042cb0)
    at llvm-project/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:280
#22 0x000000010339474c in cc1_main (Argv=..., Argv0=<optimized out>, MainAddr=<optimized out>)
    at llvm-project/clang/tools/driver/cc1_main.cpp:285
#23 0x0000000103392070 in ExecuteCC1Tool (ArgV=..., ToolContext=...)
    at llvm-project/clang/tools/driver/driver.cpp:215
#24 0x00000001091810bc in clang_main(int, char**, llvm::ToolContext const&)::$_0::operator()(llvm::SmallVectorImpl<char const*>&) const
    (this=0xffffffffffffffff, ArgV=...) at llvm-project/clang/tools/driver/driver.cpp:355
#25 llvm::function_ref<int (llvm::SmallVectorImpl<char const*>&)>::callback_fn<clang_main(int, char**, llvm::ToolContext const&)::$_0>(long, llvm::SmallVectorImpl<char const*>&) (callable=-1, params=...)
    at llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45
#26 0x0000000107991bfc in llvm::function_ref<int (llvm::SmallVectorImpl<char const*>&)>::operator()(llvm::SmallVectorImpl<char const*>&) const (this=<optimized out>, params=...) at llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68
#27 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::__1::optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, bool*) const::$_0::operator()() const (this=0xfffffffffffdb80)
    at llvm-project/clang/lib/Driver/Job.cpp:440
#28 llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::__1::optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, bool*) const::$_0>(long) (
    callable=1152921504606837632) at llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45
#29 0x0000000103ad1424 in llvm::function_ref<void ()>::operator()() const (this=<optimized out>)
    at llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68
#30 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (this=<optimized out>, Fn=...)
    at llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:426
#31 0x0000000107991a30 in clang::driver::CC1Command::Execute (this=0xfffffffffffdbb0, Redirects=..., ErrMsg=<optimized out>, 
    ExecutionFailed=<optimized out>) at llvm-project/clang/lib/Driver/Job.cpp:440
#32 0x000000010917997c in clang::driver::Compilation::ExecuteCommand (this=0x1210429d0, C=..., FailingCommand=warning: (Internal error: pc 0x0 in read in CU, but not in symtab.)
warning: (Error: pc 0x0 in address map, but not in symtab.)
@0xfffffffffffe1b8: 0x0, 
    LogOnly=<optimized out>) at llvm-project/clang/lib/Driver/Compilation.cpp:199
#33 0x00000001091791bc in clang::driver::Compilation::ExecuteJobs (this=0x1210429d0, Jobs=..., FailingCommands=..., 
    LogOnly=<optimized out>) at llvm-project/clang/lib/Driver/Compilation.cpp:253
#34 0x0000000109181918 in clang::driver::Driver::ExecuteCompilation (this=0xfffffffffffe4b8, C=..., FailingCommands=...)
    at llvm-project/clang/lib/Driver/Driver.cpp:1943
#35 0x00000001000168ec in clang_main (Argc=<optimized out>, Argv=<optimized out>, ToolContext=...)
    at llvm-project/clang/tools/driver/driver.cpp:391
#36 0x00000001000009fc in main (argc=5, argv=0xffffffffffff8b0)
    at build/tools/clang/tools/driver/clang-driver.cpp:17

@khuey
Copy link
Contributor Author

khuey commented Sep 5, 2024

I think the test needs an ; XFAIL: target={{.*}}-aix{{.*}} at the top since AIX doesn't support DWARF 5 (specifically the XCOFF code doesn't have any provision for creating the new sections from DWARF 5 such as .debug_rnglists).

@khuey khuey deleted the deduplicate-debug-ranges branch October 4, 2024 23:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants