Skip to content

[clang] Canonicalize absolute paths in dependency file #117458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 1, 2025

Conversation

xtexx
Copy link
Contributor

@xtexx xtexx commented Nov 24, 2024

This fixes #117438.

If paths in dependency file are not absoulte, make (or ninja) will canonicalize them.
While their canonicalization does not involves symbolic links expansion (for IO performance concerns), leaving a non-absolute path in dependency file may lead to unexpected canonicalization.
For example, '/a/../b', where '/a' is a symlink to '/c/d', it should be '/c/b' but make (and ninja) canonicalizes it as '/b', and fails for file not found.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the clang Clang issues not falling into any other category label Nov 24, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 24, 2024

@llvm/pr-subscribers-clang

Author: xtex (xtexChooser)

Changes

This fixes #117438.

If paths in dependency file are not absoulte, make (or ninja) will canonicalize them.
While their canonicalization does not involves symbolic links expansion (for IO performance concerns), leaving a non-absolute path in dependency file may lead to unexpected canonicalization.
For example, '/a/../b', where '/a' is a symlink to '/c/d', it should be '/c/b' but make (and ninja) canonicalizes it as '/b', and fails for file not found.


Full diff: https://github.com/llvm/llvm-project/pull/117458.diff

2 Files Affected:

  • (modified) clang/include/clang/Frontend/Utils.h (+1)
  • (modified) clang/lib/Frontend/DependencyFile.cpp (+12-4)
diff --git a/clang/include/clang/Frontend/Utils.h b/clang/include/clang/Frontend/Utils.h
index 604e42067a3f1e..8ed17179c9824b 100644
--- a/clang/include/clang/Frontend/Utils.h
+++ b/clang/include/clang/Frontend/Utils.h
@@ -120,6 +120,7 @@ class DependencyFileGenerator : public DependencyCollector {
 private:
   void outputDependencyFile(DiagnosticsEngine &Diags);
 
+  llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem> FS;
   std::string OutputFile;
   std::vector<std::string> Targets;
   bool IncludeSystemHeaders;
diff --git a/clang/lib/Frontend/DependencyFile.cpp b/clang/lib/Frontend/DependencyFile.cpp
index 528eae2c5283ea..ce7183b47e67c2 100644
--- a/clang/lib/Frontend/DependencyFile.cpp
+++ b/clang/lib/Frontend/DependencyFile.cpp
@@ -10,11 +10,11 @@
 //
 //===----------------------------------------------------------------------===//
 
-#include "clang/Frontend/Utils.h"
 #include "clang/Basic/FileManager.h"
 #include "clang/Basic/SourceManager.h"
 #include "clang/Frontend/DependencyOutputOptions.h"
 #include "clang/Frontend/FrontendDiagnostic.h"
+#include "clang/Frontend/Utils.h"
 #include "clang/Lex/DirectoryLookup.h"
 #include "clang/Lex/ModuleMap.h"
 #include "clang/Lex/PPCallbacks.h"
@@ -23,6 +23,7 @@
 #include "llvm/ADT/StringSet.h"
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/Path.h"
+#include "llvm/Support/VirtualFileSystem.h"
 #include "llvm/Support/raw_ostream.h"
 #include <optional>
 
@@ -236,6 +237,7 @@ void DependencyFileGenerator::attachToPreprocessor(Preprocessor &PP) {
     PP.SetSuppressIncludeNotFoundError(true);
 
   DependencyCollector::attachToPreprocessor(PP);
+  FS = PP.getFileManager().getVirtualFileSystemPtr();
 }
 
 bool DependencyFileGenerator::sawDependency(StringRef Filename, bool FromModule,
@@ -312,11 +314,17 @@ void DependencyFileGenerator::finishedMainFile(DiagnosticsEngine &Diags) {
 /// https://msdn.microsoft.com/en-us/library/dd9y37ha.aspx for NMake info,
 /// https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx
 /// for Windows file-naming info.
-static void PrintFilename(raw_ostream &OS, StringRef Filename,
+static void PrintFilename(raw_ostream &OS, llvm::vfs::FileSystem *FS,
+                          StringRef Filename,
                           DependencyOutputFormat OutputFormat) {
   // Convert filename to platform native path
   llvm::SmallString<256> NativePath;
   llvm::sys::path::native(Filename.str(), NativePath);
+  // Make path absolute. Make and Ninja canonicalize paths without checking for
+  // symbolic links in the path, for performance concerns.
+  // If there is something like `/bin/../lib64` -> `/usr/lib64`
+  // (where `/bin` links to `/usr/bin`), Make will see them as `/lib64`.
+  FS->makeAbsolute(NativePath);
 
   if (OutputFormat == DependencyOutputFormat::NMake) {
     // Add quotes if needed. These are the characters listed as "special" to
@@ -400,7 +408,7 @@ void DependencyFileGenerator::outputDependencyFile(llvm::raw_ostream &OS) {
       Columns = 2;
     }
     OS << ' ';
-    PrintFilename(OS, File, OutputFormat);
+    PrintFilename(OS, FS.get(), File, OutputFormat);
     Columns += N + 1;
   }
   OS << '\n';
@@ -411,7 +419,7 @@ void DependencyFileGenerator::outputDependencyFile(llvm::raw_ostream &OS) {
     for (auto I = Files.begin(), E = Files.end(); I != E; ++I) {
       if (Index++ == InputFileIndex)
         continue;
-      PrintFilename(OS, *I, OutputFormat);
+      PrintFilename(OS, FS.get(), *I, OutputFormat);
       OS << ":\n";
     }
   }

@xtexx xtexx force-pushed the path-abs branch 2 times, most recently from eb468c6 to 3a9de0b Compare November 24, 2024 02:47
@xtexx
Copy link
Contributor Author

xtexx commented Dec 29, 2024

cc @MaskRay

@xtexx xtexx force-pushed the path-abs branch 9 times, most recently from 79d2ed6 to 057dfa5 Compare December 31, 2024 13:53
DependencyOutputFormat OutputFormat) {
// Convert filename to platform native path
llvm::SmallString<256> NativePath;
llvm::sys::path::native(Filename.str(), NativePath);
// Resolve absolute path. Make and Ninja canonicalize paths
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL the Make behavior :)

@madscientist

This fixes llvm#117438.

If paths in dependency file are not absoulte, make (or ninja) will
canonicalize them.
While their canonicalization does not involves symbolic links
expansion (for IO performance concerns), leaving a
non-absolute path in dependency file may lead to unexpected
canonicalization.
For example, '/a/../b', where '/a' is a symlink to '/c/d', it should be
'/c/b' but make (and ninja) canonicalizes it as '/b', and fails for file
not found.

Signed-off-by: Bingwu Zhang <[email protected]>
@MaskRay
Copy link
Member

MaskRay commented Jan 1, 2025

There is a typo in the subject. Absoultify => Absolutify. That said, sth like "canonicalize absolute paths " seem more appropriate wrt the changed semantics.

You just need to change the PR title/description, as they are used by the default commit message. The git commits messages are ignored.

@MaskRay MaskRay self-requested a review January 1, 2025 00:48
@xtexx xtexx changed the title [clang] Absoultify paths in dependency file output [clang] Canonicalize absolute paths in dependency file output Jan 1, 2025
@xtexx xtexx changed the title [clang] Canonicalize absolute paths in dependency file output [clang] Canonicalize absolute paths in dependency file Jan 1, 2025
@xtexx
Copy link
Contributor Author

xtexx commented Jan 1, 2025

Done

@MaskRay MaskRay merged commit ca2ab74 into llvm:main Jan 1, 2025
7 checks passed
Copy link

github-actions bot commented Jan 1, 2025

@xtexChooser Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@kadircet
Copy link
Member

kadircet commented Jan 3, 2025

It's hard to say what's the right/wrong behavior when it comes to symlink handling, but I am having a hard time understanding what kind of applications can benefit from this new behavior. AFAICT most workflows that involve dependency files use it for "caching" purposes, and some use it to for shipping dependencies to a remote build environment.

As some lit test changes also demonstrate, after this change a symlink like b/header.h -> a/header.h won't be captured in the dep output. So if the user modifies b/header.h, build systems won't invalidate their caches, or a remote build system that sets up a directory layout will fail compilations due to missing header b/header.h (as that's how clang will try to retrieve this file).

Hence I think clang's previous behavior that preserved symlinks was actually the only desired behavior for current applications, AFAICT there's no way to correctly capture all of this dependencies if we resolve symlinks. I'd argue that your issue is with the downstream consumers of these deps files and needs to be addressed at their layer.

If you think I misunderstood the situation around what's broken in the downstream users, can you please clarify why do you think the new behavior is "correct"? Otherwise I think we should revert this change to make sure we don't cause breakages that won't be detected until a new release.

@xtexx
Copy link
Contributor Author

xtexx commented Jan 4, 2025

What you said is also a problem. Maybe we should print both raw path and resolved path?

If the path is not resolved, Make and Ninja will canonicalize these paths without resolving directory symbolic-links in the path, which may lead to a broken path, as the example I have given above.

It is also true that these build tools have their own performance concerns, while canonicalizing a large number of paths may be slow. Doing this in LLVM can make the process a one-time job (if the file is not modified). Thus, I think we need another change to print original non-canonicalized path, too.

@xtexx xtexx deleted the path-abs branch January 4, 2025 10:05
@xtexx
Copy link
Contributor Author

xtexx commented Jan 4, 2025

Ahhh wait, if the original path is also printed, Make may see them as broken. I don't know what to do now, oops.

Another solution is to find out all symbolic links in the path and print them separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[clang] clang -M should print expanded paths
4 participants