Skip to content

[clang] Make LazyOffsetPtr more portable #112927

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 18, 2024
Merged

Conversation

jrtc27
Copy link
Collaborator

@jrtc27 jrtc27 commented Oct 18, 2024

LazyOffsetPtr currently relies on uint64_t being able to store a pointer
and, unless sizeof(uint64_t) == sizeof(void *), little endianness, since
getAddressOfPointer reinterprets the memory as a pointer. This also
doesn't properly respect the C++ object model.

As removing getAddressOfPointer would have wide-reaching implications,
improve the implementation to account for these problems by using
placement new and a suitably sized-and-aligned buffer, "right"-aligning
the objects on big-endian platforms so the LSBs are in the same place
for use as the discriminator.

Fixes: bc73ef0
Fixes: #111993

LazyOffsetPtr currently relies on uint64_t being able to store a pointer
and, unless sizeof(uint64_t) == sizeof(void *), little endianness, since
getAddressOfPointer reinterprets the memory as a pointer. This also
doesn't properly respect the C++ object model.

As removing getAddressOfPointer would have wide-reaching implications,
improve the implementation to account for these problems by using
placement new and a suitably sized-and-aligned buffer, "right"-aligning
the objects on big-endian platforms so the LSBs are in the same place
for use as the discriminator.

Fixes: bc73ef0
Fixes: llvm#111993
@jrtc27 jrtc27 requested a review from zygoloid October 18, 2024 15:59
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Oct 18, 2024
@llvmbot
Copy link
Member

llvmbot commented Oct 18, 2024

@llvm/pr-subscribers-clang

Author: Jessica Clarke (jrtc27)

Changes

LazyOffsetPtr currently relies on uint64_t being able to store a pointer
and, unless sizeof(uint64_t) == sizeof(void *), little endianness, since
getAddressOfPointer reinterprets the memory as a pointer. This also
doesn't properly respect the C++ object model.

As removing getAddressOfPointer would have wide-reaching implications,
improve the implementation to account for these problems by using
placement new and a suitably sized-and-aligned buffer, "right"-aligning
the objects on big-endian platforms so the LSBs are in the same place
for use as the discriminator.

Fixes: bc73ef0
Fixes: #111993


Full diff: https://github.com/llvm/llvm-project/pull/112927.diff

1 Files Affected:

  • (modified) clang/include/clang/AST/ExternalASTSource.h (+35-13)
diff --git a/clang/include/clang/AST/ExternalASTSource.h b/clang/include/clang/AST/ExternalASTSource.h
index 385c32edbae0fd..caf37144d5eb73 100644
--- a/clang/include/clang/AST/ExternalASTSource.h
+++ b/clang/include/clang/AST/ExternalASTSource.h
@@ -25,10 +25,12 @@
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/iterator.h"
 #include "llvm/Support/PointerLikeTypeTraits.h"
+#include <algorithm>
 #include <cassert>
 #include <cstddef>
 #include <cstdint>
 #include <iterator>
+#include <new>
 #include <optional>
 #include <utility>
 
@@ -326,29 +328,49 @@ struct LazyOffsetPtr {
   ///
   /// If the low bit is clear, a pointer to the AST node. If the low
   /// bit is set, the upper 63 bits are the offset.
-  mutable uint64_t Ptr = 0;
+  static constexpr size_t DataSize = std::max(sizeof(uint64_t), sizeof(T *));
+  alignas(uint64_t) alignas(T *) mutable unsigned char Data[DataSize] = {};
+
+  unsigned char GetLSB() const {
+    return Data[llvm::sys::IsBigEndianHost ? DataSize - 1 : 0];
+  }
+
+  template <typename U> U &As(bool New) const {
+    unsigned char *Obj =
+        Data + (llvm::sys::IsBigEndianHost ? DataSize - sizeof(U) : 0);
+    if (New)
+      return *new (Obj) U;
+    return *std::launder(reinterpret_cast<U *>(Obj));
+  }
+
+  T *&GetPtr() const { return As<T *>(false); }
+  uint64_t &GetU64() const { return As<uint64_t>(false); }
+  void SetPtr(T *Ptr) const { As<T *>(true) = Ptr; }
+  void SetU64(uint64_t U64) const { As<uint64_t>(true) = U64; }
 
 public:
   LazyOffsetPtr() = default;
-  explicit LazyOffsetPtr(T *Ptr) : Ptr(reinterpret_cast<uint64_t>(Ptr)) {}
+  explicit LazyOffsetPtr(T *Ptr) : Data() { SetPtr(Ptr); }
 
-  explicit LazyOffsetPtr(uint64_t Offset) : Ptr((Offset << 1) | 0x01) {
+  explicit LazyOffsetPtr(uint64_t Offset) : Data() {
     assert((Offset << 1 >> 1) == Offset && "Offsets must require < 63 bits");
     if (Offset == 0)
-      Ptr = 0;
+      SetPtr(NULL);
+    else
+      SetU64((Offset << 1) | 0x01);
   }
 
   LazyOffsetPtr &operator=(T *Ptr) {
-    this->Ptr = reinterpret_cast<uint64_t>(Ptr);
+    SetPtr(Ptr);
     return *this;
   }
 
   LazyOffsetPtr &operator=(uint64_t Offset) {
     assert((Offset << 1 >> 1) == Offset && "Offsets must require < 63 bits");
     if (Offset == 0)
-      Ptr = 0;
+      SetPtr(NULL);
     else
-      Ptr = (Offset << 1) | 0x01;
+      SetU64((Offset << 1) | 0x01);
 
     return *this;
   }
@@ -356,15 +378,15 @@ struct LazyOffsetPtr {
   /// Whether this pointer is non-NULL.
   ///
   /// This operation does not require the AST node to be deserialized.
-  explicit operator bool() const { return Ptr != 0; }
+  explicit operator bool() const { return isOffset() || GetPtr() != NULL; }
 
   /// Whether this pointer is non-NULL.
   ///
   /// This operation does not require the AST node to be deserialized.
-  bool isValid() const { return Ptr != 0; }
+  bool isValid() const { return isOffset() || GetPtr() != NULL; }
 
   /// Whether this pointer is currently stored as an offset.
-  bool isOffset() const { return Ptr & 0x01; }
+  bool isOffset() const { return GetLSB() & 0x01; }
 
   /// Retrieve the pointer to the AST node that this lazy pointer points to.
   ///
@@ -375,9 +397,9 @@ struct LazyOffsetPtr {
     if (isOffset()) {
       assert(Source &&
              "Cannot deserialize a lazy pointer without an AST source");
-      Ptr = reinterpret_cast<uint64_t>((Source->*Get)(OffsT(Ptr >> 1)));
+      SetPtr((Source->*Get)(OffsT(GetU64() >> 1)));
     }
-    return reinterpret_cast<T*>(Ptr);
+    return GetPtr();
   }
 
   /// Retrieve the address of the AST node pointer. Deserializes the pointee if
@@ -385,7 +407,7 @@ struct LazyOffsetPtr {
   T **getAddressOfPointer(ExternalASTSource *Source) const {
     // Ensure the integer is in pointer form.
     (void)get(Source);
-    return reinterpret_cast<T**>(&Ptr);
+    return &GetPtr();
   }
 };
 

Copy link
Collaborator

@zygoloid zygoloid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One style nit, otherwise looks good, thanks!

@jrtc27
Copy link
Collaborator Author

jrtc27 commented Oct 18, 2024

One style nit, otherwise looks good, thanks!

I'm not sure "good" is quite the right word given the horrors this interface is hiding, but thanks for the review!

@jrtc27
Copy link
Collaborator Author

jrtc27 commented Oct 18, 2024

@awilfox Could you please verify if this fixes the bug for you?

@ernsteiswuerfel
Copy link

I can confirm it fixes issue #86460 for me just as awilfox original PR.

@awilfox
Copy link

awilfox commented Oct 18, 2024

I'll run a full test on the ppc32 builder now and report back. Thanks for this work.

@awilfox
Copy link

awilfox commented Oct 18, 2024

Yes, this passes the full Clang test suite on 32-bit PowerPC. Good to go.

@jrtc27
Copy link
Collaborator Author

jrtc27 commented Oct 18, 2024

Thanks both.

@jrtc27 jrtc27 merged commit 7619699 into llvm:main Oct 18, 2024
6 of 8 checks passed
@jrtc27 jrtc27 deleted the LazyOffsetPtr-portable branch October 18, 2024 20:55
@thesamesam
Copy link
Member

thesamesam commented Oct 19, 2024

I appreciate it's not exactly a straightforward change given the discussion it provoked, but any chance you'd be okay with a backport to 19? Clang isn't really usable on ppc32 without this. We could pull it in just for ppc32 downstream but we generally dislike doing conditional patching. WDYT?

cc @mgorny

@glaubitz
Copy link
Contributor

I appreciate it's not exactly a straightforward change given the discussion it provoked, but any chance you'd be okay with a backport to 19? Clang isn't really usable on ppc32 without this. We could pull it in just for ppc32 downstream but we generally dislike doing conditional patching. WDYT?

cc @mgorny

I agree. It would be great if this fix could be backported.

@mgorny
Copy link
Member

mgorny commented Oct 19, 2024

Doesn't it change ABI, though?

@zygoloid
Copy link
Collaborator

Doesn't it change ABI, though?

Only on platforms where it didn't work at all before.

tru pushed a commit to llvmbot/llvm-project that referenced this pull request Oct 29, 2024
LazyOffsetPtr currently relies on uint64_t being able to store a pointer
and, unless sizeof(uint64_t) == sizeof(void *), little endianness, since
getAddressOfPointer reinterprets the memory as a pointer. This also
doesn't properly respect the C++ object model.

As removing getAddressOfPointer would have wide-reaching implications,
improve the implementation to account for these problems by using
placement new and a suitably sized-and-aligned buffer, "right"-aligning
the objects on big-endian platforms so the LSBs are in the same place
for use as the discriminator.

Fixes: bc73ef0
Fixes: llvm#111993
(cherry picked from commit 7619699)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Clang] Crash when building trivial C++ code on PowerPC (32-bit)
8 participants