Skip to content

Commit d8ec452

Browse files
committed
[serialization] no transitive decl change (#92083)
Following of #86912 The motivation of the patch series is that, for a module interface unit `X`, when the dependent modules of `X` changes, if the changes is not relevant with `X`, we hope the BMI of `X` won't change. For the specific patch, we hope if the changes was about irrelevant declaration changes, we hope the BMI of `X` won't change. **However**, I found the patch itself is not very useful in practice, since the adding or removing declarations, will change the state of identifiers and types in most cases. That said, for the most simple example, ``` // partA.cppm export module m:partA; // partA.v1.cppm export module m:partA; export void a() {} // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` the BMI of `onlyUseB` will change after we change the implementation of `partA.cppm` to `partA.v1.cppm`. Since `partA.v1.cppm` introduces new identifiers and types (the function prototype). So in this patch, we have to write the tests as: ``` // partA.cppm export module m:partA; export int getA() { ... } export int getA2(int) { ... } // partA.v1.cppm export module m:partA; export int getA() { ... } export int getA(int) { ... } export int getA2(int) { ... } // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` so that the new introduced declaration `int getA(int)` doesn't introduce new identifiers and types, then the BMI of `onlyUseB` can keep unchanged. While it looks not so great, the patch should be the base of the patch to erase the transitive change for identifiers and types since I don't know how can we introduce new types and identifiers without introducing new declarations. Given how tightly the relationship between declarations, types and identifiers, I think we can only reach the ideal state after we made the series for all of the three entties. The design of the patch is similar to #86912, which extends the 32-bit DeclID to 64-bit and use the higher bits to store the module file index and the lower bits to store the Local Decl ID. A slight difference is that we only use 48 bits to store the new DeclID since we try to use the higher 16 bits to store the module ID in the prefix of Decl class. Previously, we use 32 bits to store the module ID and 32 bits to store the DeclID. I don't want to allocate additional space so I tried to make the additional space the same as 64 bits. An potential interesting thing here is about the relationship between the module ID and the module file index. I feel we can get the module file index by the module ID. But I didn't prove it or implement it. Since I want to make the patch itself as small as possible. We can make it in the future if we want. Another change in the patch is the new concept Decl Index, which means the index of the very big array `DeclsLoaded` in ASTReader. Previously, the index of a loaded declaration is simply the Decl ID minus PREDEFINED_DECL_NUMs. So there are some places they got used ambiguously. But this patch tried to split these two concepts. As #86912 did, the change will increase the on-disk PCM file sizes. As the declaration ID may be the most IDs in the PCM file, this can have the biggest impact on the size. In my experiments, this change will bring 6.6% increase of the on-disk PCM size. No compile-time performance regression observed. Given the benefits in the motivation example, I think the cost is worthwhile.
1 parent b7e472c commit d8ec452

File tree

12 files changed

+302
-154
lines changed

12 files changed

+302
-154
lines changed

clang/include/clang/AST/DeclBase.h

Lines changed: 3 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -701,10 +701,7 @@ class alignas(8) Decl {
701701

702702
/// Set the owning module ID. This may only be called for
703703
/// deserialized Decls.
704-
void setOwningModuleID(unsigned ID) {
705-
assert(isFromASTFile() && "Only works on a deserialized declaration");
706-
*((unsigned*)this - 2) = ID;
707-
}
704+
void setOwningModuleID(unsigned ID);
708705

709706
public:
710707
/// Determine the availability of the given declaration.
@@ -777,19 +774,11 @@ class alignas(8) Decl {
777774

778775
/// Retrieve the global declaration ID associated with this
779776
/// declaration, which specifies where this Decl was loaded from.
780-
GlobalDeclID getGlobalID() const {
781-
if (isFromASTFile())
782-
return (*((const GlobalDeclID *)this - 1));
783-
return GlobalDeclID();
784-
}
777+
GlobalDeclID getGlobalID() const;
785778

786779
/// Retrieve the global ID of the module that owns this particular
787780
/// declaration.
788-
unsigned getOwningModuleID() const {
789-
if (isFromASTFile())
790-
return *((const unsigned*)this - 2);
791-
return 0;
792-
}
781+
unsigned getOwningModuleID() const;
793782

794783
private:
795784
Module *getOwningModuleSlow() const;

clang/include/clang/AST/DeclID.h

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@
1919
#include "llvm/ADT/DenseMapInfo.h"
2020
#include "llvm/ADT/iterator.h"
2121

22+
#include <climits>
23+
2224
namespace clang {
2325

2426
/// Predefined declaration IDs.
@@ -107,12 +109,16 @@ class DeclIDBase {
107109
///
108110
/// DeclID should only be used directly in serialization. All other users
109111
/// should use LocalDeclID or GlobalDeclID.
110-
using DeclID = uint32_t;
112+
using DeclID = uint64_t;
111113

112114
protected:
113115
DeclIDBase() : ID(PREDEF_DECL_NULL_ID) {}
114116
explicit DeclIDBase(DeclID ID) : ID(ID) {}
115117

118+
explicit DeclIDBase(unsigned LocalID, unsigned ModuleFileIndex) {
119+
ID = (DeclID)LocalID | ((DeclID)ModuleFileIndex << 32);
120+
}
121+
116122
public:
117123
DeclID get() const { return ID; }
118124

@@ -124,6 +130,10 @@ class DeclIDBase {
124130

125131
bool isInvalid() const { return ID == PREDEF_DECL_NULL_ID; }
126132

133+
unsigned getModuleFileIndex() const { return ID >> 32; }
134+
135+
unsigned getLocalDeclIndex() const;
136+
127137
friend bool operator==(const DeclIDBase &LHS, const DeclIDBase &RHS) {
128138
return LHS.ID == RHS.ID;
129139
}
@@ -156,6 +166,9 @@ class LocalDeclID : public DeclIDBase {
156166
LocalDeclID(PredefinedDeclIDs ID) : Base(ID) {}
157167
explicit LocalDeclID(DeclID ID) : Base(ID) {}
158168

169+
explicit LocalDeclID(unsigned LocalID, unsigned ModuleFileIndex)
170+
: Base(LocalID, ModuleFileIndex) {}
171+
159172
LocalDeclID &operator++() {
160173
++ID;
161174
return *this;
@@ -175,6 +188,9 @@ class GlobalDeclID : public DeclIDBase {
175188
GlobalDeclID() : Base() {}
176189
explicit GlobalDeclID(DeclID ID) : Base(ID) {}
177190

191+
explicit GlobalDeclID(unsigned LocalID, unsigned ModuleFileIndex)
192+
: Base(LocalID, ModuleFileIndex) {}
193+
178194
// For DeclIDIterator<GlobalDeclID> to be able to convert a GlobalDeclID
179195
// to a LocalDeclID.
180196
explicit operator LocalDeclID() const { return LocalDeclID(this->ID); }

clang/include/clang/Serialization/ASTBitCodes.h

Lines changed: 25 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -255,6 +255,12 @@ class DeclOffset {
255255
}
256256
};
257257

258+
// The unaligned decl ID used in the Blobs of bistreams.
259+
using unaligned_decl_id_t =
260+
llvm::support::detail::packed_endian_specific_integral<
261+
serialization::DeclID, llvm::endianness::native,
262+
llvm::support::unaligned>;
263+
258264
/// The number of predefined preprocessed entity IDs.
259265
const unsigned int NUM_PREDEF_PP_ENTITY_IDS = 1;
260266

@@ -1979,33 +1985,46 @@ enum CleanupObjectKind { COK_Block, COK_CompoundLiteral };
19791985

19801986
/// Describes the categories of an Objective-C class.
19811987
struct ObjCCategoriesInfo {
1982-
// The ID of the definition
1983-
LocalDeclID DefinitionID;
1988+
// The ID of the definition. Use unaligned_decl_id_t to keep
1989+
// ObjCCategoriesInfo 32-bit aligned.
1990+
unaligned_decl_id_t DefinitionID;
19841991

19851992
// Offset into the array of category lists.
19861993
unsigned Offset;
19871994

1995+
ObjCCategoriesInfo() = default;
1996+
ObjCCategoriesInfo(LocalDeclID ID, unsigned Offset)
1997+
: DefinitionID(ID.get()), Offset(Offset) {}
1998+
1999+
LocalDeclID getDefinitionID() const {
2000+
return LocalDeclID(DefinitionID);
2001+
}
2002+
19882003
friend bool operator<(const ObjCCategoriesInfo &X,
19892004
const ObjCCategoriesInfo &Y) {
1990-
return X.DefinitionID < Y.DefinitionID;
2005+
return X.getDefinitionID() < Y.getDefinitionID();
19912006
}
19922007

19932008
friend bool operator>(const ObjCCategoriesInfo &X,
19942009
const ObjCCategoriesInfo &Y) {
1995-
return X.DefinitionID > Y.DefinitionID;
2010+
return X.getDefinitionID() > Y.getDefinitionID();
19962011
}
19972012

19982013
friend bool operator<=(const ObjCCategoriesInfo &X,
19992014
const ObjCCategoriesInfo &Y) {
2000-
return X.DefinitionID <= Y.DefinitionID;
2015+
return X.getDefinitionID() <= Y.getDefinitionID();
20012016
}
20022017

20032018
friend bool operator>=(const ObjCCategoriesInfo &X,
20042019
const ObjCCategoriesInfo &Y) {
2005-
return X.DefinitionID >= Y.DefinitionID;
2020+
return X.getDefinitionID() >= Y.getDefinitionID();
20062021
}
20072022
};
20082023

2024+
static_assert(alignof(ObjCCategoriesInfo) <= 4);
2025+
static_assert(std::is_standard_layout_v<ObjCCategoriesInfo> &&
2026+
std::is_trivial_v<ObjCCategoriesInfo>);
2027+
20092028
/// A key used when looking up entities by \ref DeclarationName.
20102029
///
20112030
/// Different \ref DeclarationNames are mapped to different keys, but the

clang/include/clang/Serialization/ASTReader.h

Lines changed: 16 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -504,12 +504,6 @@ class ASTReader
504504
/// = I + 1 has already been loaded.
505505
llvm::PagedVector<Decl *> DeclsLoaded;
506506

507-
using GlobalDeclMapType = ContinuousRangeMap<GlobalDeclID, ModuleFile *, 4>;
508-
509-
/// Mapping from global declaration IDs to the module in which the
510-
/// declaration resides.
511-
GlobalDeclMapType GlobalDeclMap;
512-
513507
using FileOffset = std::pair<ModuleFile *, uint64_t>;
514508
using FileOffsetsTy = SmallVector<FileOffset, 2>;
515509
using DeclUpdateOffsetsMap = llvm::DenseMap<GlobalDeclID, FileOffsetsTy>;
@@ -592,10 +586,11 @@ class ASTReader
592586

593587
struct FileDeclsInfo {
594588
ModuleFile *Mod = nullptr;
595-
ArrayRef<LocalDeclID> Decls;
589+
ArrayRef<serialization::unaligned_decl_id_t> Decls;
596590

597591
FileDeclsInfo() = default;
598-
FileDeclsInfo(ModuleFile *Mod, ArrayRef<LocalDeclID> Decls)
592+
FileDeclsInfo(ModuleFile *Mod,
593+
ArrayRef<serialization::unaligned_decl_id_t> Decls)
599594
: Mod(Mod), Decls(Decls) {}
600595
};
601596

@@ -604,11 +599,7 @@ class ASTReader
604599

605600
/// An array of lexical contents of a declaration context, as a sequence of
606601
/// Decl::Kind, DeclID pairs.
607-
using unaligned_decl_id_t =
608-
llvm::support::detail::packed_endian_specific_integral<
609-
serialization::DeclID, llvm::endianness::native,
610-
llvm::support::unaligned>;
611-
using LexicalContents = ArrayRef<unaligned_decl_id_t>;
602+
using LexicalContents = ArrayRef<serialization::unaligned_decl_id_t>;
612603

613604
/// Map from a DeclContext to its lexical contents.
614605
llvm::DenseMap<const DeclContext*, std::pair<ModuleFile*, LexicalContents>>
@@ -1489,22 +1480,23 @@ class ASTReader
14891480
unsigned ClientLoadCapabilities);
14901481

14911482
public:
1492-
class ModuleDeclIterator : public llvm::iterator_adaptor_base<
1493-
ModuleDeclIterator, const LocalDeclID *,
1494-
std::random_access_iterator_tag, const Decl *,
1495-
ptrdiff_t, const Decl *, const Decl *> {
1483+
class ModuleDeclIterator
1484+
: public llvm::iterator_adaptor_base<
1485+
ModuleDeclIterator, const serialization::unaligned_decl_id_t *,
1486+
std::random_access_iterator_tag, const Decl *, ptrdiff_t,
1487+
const Decl *, const Decl *> {
14961488
ASTReader *Reader = nullptr;
14971489
ModuleFile *Mod = nullptr;
14981490

14991491
public:
15001492
ModuleDeclIterator() : iterator_adaptor_base(nullptr) {}
15011493

15021494
ModuleDeclIterator(ASTReader *Reader, ModuleFile *Mod,
1503-
const LocalDeclID *Pos)
1495+
const serialization::unaligned_decl_id_t *Pos)
15041496
: iterator_adaptor_base(Pos), Reader(Reader), Mod(Mod) {}
15051497

15061498
value_type operator*() const {
1507-
return Reader->GetDecl(Reader->getGlobalDeclID(*Mod, *I));
1499+
return Reader->GetDecl(Reader->getGlobalDeclID(*Mod, (LocalDeclID)*I));
15081500
}
15091501

15101502
value_type operator->() const { return **this; }
@@ -1544,6 +1536,9 @@ class ASTReader
15441536
StringRef Arg2 = StringRef(), StringRef Arg3 = StringRef()) const;
15451537
void Error(llvm::Error &&Err) const;
15461538

1539+
/// Translate a \param GlobalDeclID to the index of DeclsLoaded array.
1540+
unsigned translateGlobalDeclIDToIndex(GlobalDeclID ID) const;
1541+
15471542
public:
15481543
/// Load the AST file and validate its contents against the given
15491544
/// Preprocessor.
@@ -1915,7 +1910,8 @@ class ASTReader
19151910

19161911
/// Retrieve the module file that owns the given declaration, or NULL
19171912
/// if the declaration is not from a module file.
1918-
ModuleFile *getOwningModuleFile(const Decl *D);
1913+
ModuleFile *getOwningModuleFile(const Decl *D) const;
1914+
ModuleFile *getOwningModuleFile(GlobalDeclID ID) const;
19191915

19201916
/// Returns the source location for the decl \p ID.
19211917
SourceLocation getSourceLocationForDeclID(GlobalDeclID ID);

clang/include/clang/Serialization/ModuleFile.h

Lines changed: 3 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -454,23 +454,11 @@ class ModuleFile {
454454
/// by the declaration ID (-1).
455455
const DeclOffset *DeclOffsets = nullptr;
456456

457-
/// Base declaration ID for declarations local to this module.
458-
serialization::DeclID BaseDeclID = 0;
459-
460-
/// Remapping table for declaration IDs in this module.
461-
ContinuousRangeMap<serialization::DeclID, int, 2> DeclRemap;
462-
463-
/// Mapping from the module files that this module file depends on
464-
/// to the base declaration ID for that module as it is understood within this
465-
/// module.
466-
///
467-
/// This is effectively a reverse global-to-local mapping for declaration
468-
/// IDs, so that we can interpret a true global ID (for this translation unit)
469-
/// as a local ID (for this module file).
470-
llvm::DenseMap<ModuleFile *, serialization::DeclID> GlobalToLocalDeclIDs;
457+
/// Base declaration index in ASTReader for declarations local to this module.
458+
unsigned BaseDeclIndex = 0;
471459

472460
/// Array of file-level DeclIDs sorted by file.
473-
const LocalDeclID *FileSortedDecls = nullptr;
461+
const serialization::unaligned_decl_id_t *FileSortedDecls = nullptr;
474462
unsigned NumFileSortedDecls = 0;
475463

476464
/// Array of category list location information within this

clang/include/clang/Serialization/ModuleManager.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ namespace serialization {
4545
/// Manages the set of modules loaded by an AST reader.
4646
class ModuleManager {
4747
/// The chain of AST files, in the order in which we started to load
48-
/// them (this order isn't really useful for anything).
48+
/// them.
4949
SmallVector<std::unique_ptr<ModuleFile>, 2> Chain;
5050

5151
/// The chain of non-module PCH files. The first entry is the one named

clang/lib/AST/DeclBase.cpp

Lines changed: 32 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -74,18 +74,17 @@ void *Decl::operator new(std::size_t Size, const ASTContext &Context,
7474
GlobalDeclID ID, std::size_t Extra) {
7575
// Allocate an extra 8 bytes worth of storage, which ensures that the
7676
// resulting pointer will still be 8-byte aligned.
77-
static_assert(sizeof(unsigned) * 2 >= alignof(Decl),
78-
"Decl won't be misaligned");
77+
static_assert(sizeof(uint64_t) >= alignof(Decl), "Decl won't be misaligned");
7978
void *Start = Context.Allocate(Size + Extra + 8);
8079
void *Result = (char*)Start + 8;
8180

82-
unsigned *PrefixPtr = (unsigned *)Result - 2;
81+
uint64_t *PrefixPtr = (uint64_t *)Result - 1;
8382

84-
// Zero out the first 4 bytes; this is used to store the owning module ID.
85-
PrefixPtr[0] = 0;
83+
*PrefixPtr = ID.get();
8684

87-
// Store the global declaration ID in the second 4 bytes.
88-
PrefixPtr[1] = ID.get();
85+
// We leave the upper 16 bits to store the module IDs. 48 bits should be
86+
// sufficient to store a declaration ID.
87+
assert(*PrefixPtr < llvm::maskTrailingOnes<uint64_t>(48));
8988

9089
return Result;
9190
}
@@ -111,6 +110,28 @@ void *Decl::operator new(std::size_t Size, const ASTContext &Ctx,
111110
return ::operator new(Size + Extra, Ctx);
112111
}
113112

113+
GlobalDeclID Decl::getGlobalID() const {
114+
if (!isFromASTFile())
115+
return GlobalDeclID();
116+
// See the comments in `Decl::operator new` for details.
117+
uint64_t ID = *((const uint64_t *)this - 1);
118+
return GlobalDeclID(ID & llvm::maskTrailingOnes<uint64_t>(48));
119+
}
120+
121+
unsigned Decl::getOwningModuleID() const {
122+
if (!isFromASTFile())
123+
return 0;
124+
125+
uint64_t ID = *((const uint64_t *)this - 1);
126+
return ID >> 48;
127+
}
128+
129+
void Decl::setOwningModuleID(unsigned ID) {
130+
assert(isFromASTFile() && "Only works on a deserialized declaration");
131+
uint64_t *IDAddress = (uint64_t *)this - 1;
132+
*IDAddress |= (uint64_t)ID << 48;
133+
}
134+
114135
Module *Decl::getOwningModuleSlow() const {
115136
assert(isFromASTFile() && "Not from AST file?");
116137
return getASTContext().getExternalSource()->getModule(getOwningModuleID());
@@ -2164,3 +2185,7 @@ DependentDiagnostic *DependentDiagnostic::Create(ASTContext &C,
21642185

21652186
return DD;
21662187
}
2188+
2189+
unsigned DeclIDBase::getLocalDeclIndex() const {
2190+
return ID & llvm::maskTrailingOnes<DeclID>(32);
2191+
}

0 commit comments

Comments
 (0)