Skip to content

[memprof] Undrift MemProfRecord #120138

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 18, 2024

Conversation

kazutakahirata
Copy link
Contributor

@kazutakahirata kazutakahirata commented Dec 16, 2024

This patch undrifts source locations in MemProfRecord before readMemprof
starts the matching process.

The thoery of operation is as follows:

  1. Collect the lists of direct calls, one from the IR and the other
    from the profile.

  2. Compute the correspondence (called undrift map in the patch)
    between the two lists with longestCommonSequence.

  3. Apply the undrift map just before readMemprof consumes
    MemProfRecord.

The new function gated by a flag that is off by default.

This patch undrifts source locations in MemProfRecord before readMemprof
starts the matching process.

The thoery of operation is as follows:

1. Collect the lists of direct calls, one from the IR and the other
   from the profile.

2. Compute the correspondence (called undrift map in the patch)
   between the two lists with longestCommonSequence.

3. Apply the undrift map just before readMemprof consumes
   MemProfRecord.

The new function gated by a flag that is off by default.

The test case is identical to recently added
memprof_annotate_yaml.test except that the source location in the
profile has been modified.
@llvmbot llvmbot added PGO Profile Guided Optimizations llvm:transforms labels Dec 16, 2024
@llvmbot
Copy link
Member

llvmbot commented Dec 16, 2024

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-pgo

Author: Kazu Hirata (kazutakahirata)

Changes

This patch undrifts source locations in MemProfRecord before readMemprof
starts the matching process.

The thoery of operation is as follows:

  1. Collect the lists of direct calls, one from the IR and the other
    from the profile.

  2. Compute the correspondence (called undrift map in the patch)
    between the two lists with longestCommonSequence.

  3. Apply the undrift map just before readMemprof consumes
    MemProfRecord.

The new function gated by a flag that is off by default.

The test case is identical to recently added
memprof_annotate_yaml.test except that the source location in the
profile has been modified.


Full diff: https://github.com/llvm/llvm-project/pull/120138.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Instrumentation/MemProfiler.cpp (+45-2)
  • (added) llvm/test/Transforms/PGOProfile/memprof-undrift.test (+47)
diff --git a/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp b/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp
index c980869a1c0d82..b416ef30f406bb 100644
--- a/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp
+++ b/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp
@@ -171,6 +171,10 @@ static cl::opt<std::string>
                                  cl::desc("The default memprof options"),
                                  cl::Hidden, cl::init(""));
 
+static cl::opt<bool> UndriftProfile("memprof-undrift-profile",
+                                    cl::desc("Undrift MemProf profile"),
+                                    cl::init(false), cl::Hidden);
+
 extern cl::opt<bool> MemProfReportHintedSizes;
 
 // Instrumentation statistics
@@ -907,10 +911,38 @@ memprof::computeUndriftMap(Module &M, IndexedInstrProfReader *MemProfReader,
   return UndriftMaps;
 }
 
+// Given a MemProfRecord, undrift all the source locations present in the
+// record in place.
+static void
+undriftMemProfRecord(const DenseMap<uint64_t, LocToLocMap> &UndriftMaps,
+                     memprof::MemProfRecord &MemProfRec) {
+  // Undrift a call stack in place.
+  auto UndriftCallStack = [&](std::vector<Frame> &CallStack) {
+    for (auto &F : CallStack) {
+      auto I = UndriftMaps.find(F.Function);
+      if (I == UndriftMaps.end())
+        continue;
+      auto J = I->second.find(LineLocation(F.LineOffset, F.Column));
+      if (J == I->second.end())
+        continue;
+      auto &NewLoc = J->second;
+      F.LineOffset = NewLoc.LineOffset;
+      F.Column = NewLoc.Column;
+    }
+  };
+
+  for (auto &AS : MemProfRec.AllocSites)
+    UndriftCallStack(AS.CallStack);
+
+  for (auto &CS : MemProfRec.CallSites)
+    UndriftCallStack(CS);
+}
+
 static void
 readMemprof(Module &M, Function &F, IndexedInstrProfReader *MemProfReader,
             const TargetLibraryInfo &TLI,
-            std::map<uint64_t, AllocMatchInfo> &FullStackIdToAllocMatchInfo) {
+            std::map<uint64_t, AllocMatchInfo> &FullStackIdToAllocMatchInfo,
+            DenseMap<uint64_t, LocToLocMap> &UndriftMaps) {
   auto &Ctx = M.getContext();
   // Previously we used getIRPGOFuncName() here. If F is local linkage,
   // getIRPGOFuncName() returns FuncName with prefix 'FileName;'. But
@@ -958,6 +990,11 @@ readMemprof(Module &M, Function &F, IndexedInstrProfReader *MemProfReader,
 
   NumOfMemProfFunc++;
 
+  // If requested, undrfit MemProfRecord so that the source locations in it
+  // match those in the IR.
+  if (UndriftProfile)
+    undriftMemProfRecord(UndriftMaps, *MemProfRec);
+
   // Detect if there are non-zero column numbers in the profile. If not,
   // treat all column numbers as 0 when matching (i.e. ignore any non-zero
   // columns in the IR). The profiled binary might have been built with
@@ -1176,6 +1213,11 @@ PreservedAnalyses MemProfUsePass::run(Module &M, ModuleAnalysisManager &AM) {
 
   auto &FAM = AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
 
+  TargetLibraryInfo &TLI = FAM.getResult<TargetLibraryAnalysis>(*M.begin());
+  DenseMap<uint64_t, LocToLocMap> UndriftMaps;
+  if (UndriftProfile)
+    UndriftMaps = computeUndriftMap(M, MemProfReader.get(), TLI);
+
   // Map from the stack has of each allocation context in the function profiles
   // to the total profiled size (bytes), allocation type, and whether we matched
   // it to an allocation in the IR.
@@ -1186,7 +1228,8 @@ PreservedAnalyses MemProfUsePass::run(Module &M, ModuleAnalysisManager &AM) {
       continue;
 
     const TargetLibraryInfo &TLI = FAM.getResult<TargetLibraryAnalysis>(F);
-    readMemprof(M, F, MemProfReader.get(), TLI, FullStackIdToAllocMatchInfo);
+    readMemprof(M, F, MemProfReader.get(), TLI, FullStackIdToAllocMatchInfo,
+                UndriftMaps);
   }
 
   if (ClPrintMemProfMatchInfo) {
diff --git a/llvm/test/Transforms/PGOProfile/memprof-undrift.test b/llvm/test/Transforms/PGOProfile/memprof-undrift.test
new file mode 100644
index 00000000000000..db3c48db110b58
--- /dev/null
+++ b/llvm/test/Transforms/PGOProfile/memprof-undrift.test
@@ -0,0 +1,47 @@
+; REQUIRES: x86_64-linux
+
+; Make sure that we can undrift the MemProf profile and annotate the IR
+; accordingly.
+
+; RUN: split-file %s %t
+; RUN: llvm-profdata merge %t/memprof_annotate_yaml.yaml -o %t/memprof_annotate_yaml.memprofdata
+; RUN: opt < %t/memprof_annotate_yaml.ll -passes='memprof-use<profile-filename=%t/memprof_annotate_yaml.memprofdata>' -memprof-undrift-profile -S 2>&1 | FileCheck %s
+
+;--- memprof_annotate_yaml.yaml
+---
+HeapProfileRecords:
+  - GUID:            _Z3foov
+    AllocSites:
+      - Callstack:
+          - { Function: _Z3foov, LineOffset: 3, Column: 30, IsInlineFrame: false }
+          - { Function: main, LineOffset: 2, Column: 5, IsInlineFrame: false }
+        MemInfoBlock:
+          # With these numbers, llvm::memprof::getAllocType will determine that
+          # the call to new is cold.  See MemoryProfileInfo.cpp for details.
+          TotalSize:                  400
+          AllocCount:                 1
+          TotalLifetimeAccessDensity: 1
+          TotalLifetime:              1000000
+    CallSites:       []
+...
+;--- memprof_annotate_yaml.ll
+define dso_local ptr @_Z3foov() !dbg !4 {
+entry:
+  %call = call ptr @_Znam(i64 4) #0, !dbg !5
+; CHECK: call ptr @_Znam(i64 4) #[[ATTR:[0-9]+]],
+  ret ptr %call
+}
+
+declare ptr @_Znam(i64)
+
+attributes #0 = { builtin allocsize(0) }
+; CHECK: attributes #[[ATTR]] = {{.*}} "memprof"="cold"
+
+!llvm.module.flags = !{!2, !3}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !1)
+!1 = !DIFile(filename: "t", directory: "/")
+!2 = !{i32 7, !"Dwarf Version", i32 5}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!4 = distinct !DISubprogram(name: "foo", linkageName: "_Z3foov", scope: !1, file: !1, line: 1, unit: !0)
+!5 = !DILocation(line: 1, column: 22, scope: !4)

@kazutakahirata
Copy link
Contributor Author

I've overhauled the tests. Please take a look again. Thanks!

Copy link
Contributor

@snehasish snehasish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@kazutakahirata kazutakahirata merged commit ac8a9f8 into llvm:main Dec 18, 2024
8 checks passed
@kazutakahirata kazutakahirata deleted the memprof_undrift_basic branch December 18, 2024 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:transforms PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants