Skip to content

[llvm][cas] Fix thread safety issues with MappedFileRegionBumpPtr #8625

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

benlangmuir
Copy link

  • Explanation: Opening and closing the CAS uses file locks to ensure only one process and thread will resize the CAS files. We already knew of a potential hole if multiple libclangs are loaded into a single process and open/close the same CAS on different threads. We recently discovered another thread-safety issue with a race between open/close within a single process. This fixes both issues by replacing fcntl F_SETLK, which is not thread safe, with flock, which is.
  • Scope: Fixes two races in caching builds. We are not 100% sure if these races were happening in practice, but stress testing shows they can cause a crash and we think they can lead to a corrupted CAS in the right circumstances.
  • Issue: rdar://126882843
  • Original PR: [llvm][cas] Fix thread safety issues with MappedFileRegionBumpPtr #8621
  • Risk: Impacts caching builds. Change is in core CAS file locking, so if my changes are buggy it could blow up in all kinds of nasty ways (crashes, corrupted CAS). But it's a fairly clean swap of one implementation for another using well-known primitives, so we don't anticipate any issues.
  • Testing: Added a regression test based on stress-testing our file locks. I also manually ran this test for longer periods across multiple threads and processes. I also manually tested that flock and fcntl interoperate correctly, as is stated in the man page, to ensure this will not break when there are processes using the old CAS.
  • Reviewer: @cachemeifyoucan

llvm::sys::fs::lockFile and friends, which are implemented with fcntl
F_SETLK, are only safe to use across processes and not across threads.
We were attempting to workaround that using createShared, which tries to
unique the mapped file per process, but (a) that only works if you have
a single copy of llvm in your process, which may not always be true, and
(b) we had a race in the implementation between closing one CAS and
opening a new one at the same path.  Replace these with an flock-based
implementation. Since this is not available on all platforms LLVM
supports we use a separate implementation for the on-disk CAS rather
than change llvm::sys::fs::lockFile.

rdar://126882843
(cherry picked from commit 15a76e0)
@benlangmuir
Copy link
Author

@swift-ci please test

@benlangmuir
Copy link
Author

macOS test failure is unrelated; fixed by swiftlang/sourcekit-lsp#1188 on main

@benlangmuir
Copy link
Author

@swift-ci please test macOS

1 similar comment
@benlangmuir
Copy link
Author

@swift-ci please test macOS

@fredriss fredriss merged commit 90d3f58 into swiftlang:swift/release/6.0 Apr 25, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants