Skip to content

New binary swiftdeps format #32131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 11, 2020

Conversation

slavapestov
Copy link
Contributor

It should be faster to read and write than YAML.

@slavapestov slavapestov force-pushed the binary-swiftdeps-format branch 6 times, most recently from 9fe1c9f to d75b4df Compare June 10, 2020 19:36
@slavapestov slavapestov marked this pull request as ready for review June 10, 2020 19:37
@slavapestov slavapestov force-pushed the binary-swiftdeps-format branch from d75b4df to b07a489 Compare June 10, 2020 20:20
@slavapestov
Copy link
Contributor Author

@swift-ci Please smoke test

@slavapestov slavapestov force-pushed the binary-swiftdeps-format branch from b07a489 to e25c9e3 Compare June 10, 2020 21:47
@slavapestov
Copy link
Contributor Author

@swift-ci Please smoke test

1 similar comment
@slavapestov
Copy link
Contributor Author

@swift-ci Please smoke test

@slavapestov
Copy link
Contributor Author

@swift-ci Please smoke test macOS

@slavapestov slavapestov force-pushed the binary-swiftdeps-format branch from e25c9e3 to 52b0439 Compare June 10, 2020 22:45
@slavapestov
Copy link
Contributor Author

@swift-ci Please smoke test

2 similar comments
@slavapestov
Copy link
Contributor Author

@swift-ci Please smoke test

@slavapestov
Copy link
Contributor Author

@swift-ci Please smoke test

@slavapestov
Copy link
Contributor Author

@swift-ci Please smoke test macOS

Copy link
Contributor

@davidungar davidungar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really good stuff! Reading it this way, imagining someone maintaining it in a few years, I think it could really use some more explanation.

I don't know if it's reasonable to try to centralize the invariants across writer and reader, but it would be more robust if so.

The bitstream code is surprising hard to read, as an ignorant reader. Not your fault at all, but anything you can think of that would help or factor it would benefit a maintainer.

I'm going to request changes: the one change that I feel is most important would be an explanation of the overall design, or a pointer to such. The rest, I'm happy to leave to your judgement.

@@ -632,6 +632,11 @@ class DepGraphNode {

const DependencyKey &getKey() const { return key; }

/// Only used when the driver is reading a SourceFileDepGraphNode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good comment! Sorry it cannot be enforced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be enforced with a 'friend' declaration -- but then I'd have to expose an implementation detail of the reader code in a public header. Either way, one or the other has to be public! And just having a public setter here doesn't seem like the worst idea in the world, since other data types here have public setters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like there's no way to enforce it.

When I'm debugging an unexpected value in a data type, the smaller the scope of any setters, the easier it is to find the cause. Why would the existence of public setters for other types matter?

@@ -684,15 +689,15 @@ class SourceFileDepGraphNode : public DepGraphNode {
/// True iff a Decl exists for this node.
/// If a provides and a depends in the existing system both have the same key,
/// only one SourceFileDepGraphNode is emitted.
bool isProvides;
bool isProvides = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the code, I wonder why false is the default. Would a comment explaining that be out of place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this the no-argument constructor produces an object where 'isProvides' has an undefined value. I'm not sure that warrants a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I saw the "false" I figured that the isProvides field would only be set if it were true, and you were using the false because if never set, that meant it would be false. If such is the case, the code would be clearer to me with a comment.

If, on the other hand, it is the case that this field will be set to either truth value, and the initialization to false has no effect, then, the way I read code, I would find this initialization confusing without a comment. The initializer sets the sequence number to an illegal value so that initialized-but-never-set nodes can be detected.


friend ::llvm::yaml::MappingContextTraits<SourceFileDepGraphNode,
SourceFileDepGraph>;

public:
/// When the driver imports a node create an uninitialized instance for
/// deserializing.
SourceFileDepGraphNode() : DepGraphNode(), sequenceNumber(~0) {}
SourceFileDepGraphNode() : DepGraphNode() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why get rid of the illegal sequence number? It was an extra check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already initialized to ~0 above, so this initializer was redundant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sounds good!

using llvm::BCBlob;
using llvm::BCRecordLayout;

const unsigned char FINE_GRAINED_DEPDENENCY_FORMAT_SIGNATURE[] = {'D', 'E', 'P', 'S'};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a reviewer/maintainer, it would be nice to have a comment right here explaining the purpose of a signature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment (it's the magic number that the swiftdeps file begins with).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks.


const unsigned char FINE_GRAINED_DEPDENENCY_FORMAT_SIGNATURE[] = {'D', 'E', 'P', 'S'};

const unsigned FINE_GRAINED_DEPENDENCY_FORMAT_VERSION_MAJOR = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense for this to be version 2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's the version number of the new binary format, not the version number of the overall dependency format. (In which case, it would be 3, with the fine-grained format v2 :-) )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I guess I was misled by the name "fine grained dependency format version", which implied to me that the number should be 2, since this is the second one such. But I can't think of anything that would break by leaving it as 1, since an old driver should reject this new format.

Comment on lines +38 to +41
bool readSignature();
bool enterTopLevelBlock();
bool readMetadata();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these return true on error or success?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment (true on failure)

Comment on lines 67 to 103
auto next = Cursor.advance();
if (!next) {
consumeError(next.takeError());
return true;
}

if (next->Kind != llvm::BitstreamEntry::SubBlock)
return true;

if (next->ID != llvm::bitc::BLOCKINFO_BLOCK_ID)
return true;

if (!Cursor.ReadBlockInfoBlock())
return true;
}

{
auto next = Cursor.advance();
if (!next) {
consumeError(next.takeError());
return true;
}

if (next->Kind != llvm::BitstreamEntry::SubBlock)
return true;

if (next->ID != RECORD_BLOCK_ID)
return true;

if (auto err = Cursor.EnterSubBlock(RECORD_BLOCK_ID)) {
consumeError(std::move(err));
return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels a bit long to grasp without either breaking it up or comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment

Comment on lines +106 to +139
using namespace record_block;

auto entry = Cursor.advance();
if (!entry) {
consumeError(entry.takeError());
return true;
}

if (entry->Kind != llvm::BitstreamEntry::Record)
return true;

auto recordID = Cursor.readRecord(entry->ID, Scratch, &BlobData);
if (!recordID) {
consumeError(recordID.takeError());
return true;
}

if (*recordID != METADATA)
return true;

unsigned majorVersion, minorVersion;

MetadataLayout::readRecord(Scratch, majorVersion, minorVersion);
if (majorVersion != FINE_GRAINED_DEPENDENCY_FORMAT_VERSION_MAJOR ||
minorVersion != FINE_GRAINED_DEPENDENCY_FORMAT_VERSION_MINOR) {
return true;
}

return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some pattern going on in these? Can it be abstracted? So far both this function and the prior one start out the same way, then do similar-seeming validations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better, would be if invariants could be centralized between reader & writer, though that may be impractical.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's worth trying to abstract this out. It's boilerplate for the LLVM bitstream record API, which is pretty general, and we're using it in one specific way.

Comment on lines +150 to +234
using namespace record_block;

if (readSignature())
return true;

if (enterTopLevelBlock())
return true;

if (readMetadata())
return true;

SourceFileDepGraphNode *node = nullptr;
size_t sequenceNumber = 0;

while (!Cursor.AtEndOfStream()) {
auto entry = cantFail(Cursor.advance(), "Advance bitstream cursor");

if (entry.Kind == llvm::BitstreamEntry::EndBlock) {
Cursor.ReadBlockEnd();
assert(Cursor.GetCurrentBitNo() % CHAR_BIT == 0);
break;
}

if (entry.Kind != llvm::BitstreamEntry::Record)
llvm::report_fatal_error("Bad bitstream entry kind");

Scratch.clear();
unsigned recordID = cantFail(
Cursor.readRecord(entry.ID, Scratch, &BlobData),
"Read bitstream record");

switch (recordID) {
case METADATA: {
// METADATA must appear at the beginning and is handled by readMetadata().
llvm::report_fatal_error("Unexpected METADATA record");
break;
}

case SOURCE_FILE_DEP_GRAPH_NODE: {
unsigned nodeKindID, declAspectID, contextID, nameID, isProvides;
SourceFileDepGraphNodeLayout::readRecord(Scratch, nodeKindID, declAspectID,
contextID, nameID, isProvides);
node = new SourceFileDepGraphNode();
node->setSequenceNumber(sequenceNumber++);
g.addNode(node);

auto nodeKind = getNodeKind(nodeKindID);
if (!nodeKind)
llvm::report_fatal_error("Bad node kind");
auto declAspect = getDeclAspect(declAspectID);
if (!declAspect)
llvm::report_fatal_error("Bad decl aspect");
auto context = getIdentifier(contextID);
if (!context)
llvm::report_fatal_error("Bad context");
auto name = getIdentifier(nameID);
if (!name)
llvm::report_fatal_error("Bad identifier");

node->setKey(DependencyKey(*nodeKind, *declAspect, *context, *name));
if (isProvides)
node->setIsProvides();
break;
}

case FINGERPRINT_NODE: {
// FINGERPRINT_NODE must follow a SOURCE_FILE_DEP_GRAPH_NODE.
if (node == nullptr)
llvm::report_fatal_error("Unexpected FINGERPRINT_NODE record");

node->setFingerprint(BlobData);
break;
}

case DEPENDS_ON_DEFINITION_NODE: {
// DEPENDS_ON_DEFINITION_NODE must follow a SOURCE_FILE_DEP_GRAPH_NODE.
if (node == nullptr)
llvm::report_fatal_error("Unexpected DEPENDS_ON_DEFINITION_NODE record");

unsigned dependsOnDefID;
DependsOnDefNodeLayout::readRecord(Scratch, dependsOnDefID);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Meant to select this whole function, but too long)
Not your fault at all, but this bitstream stuff reads like COBOL--so verbose!
I'm skipping the rest of the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it would be nice if, eg it could generate the reader and writer from a single declarative description. But that's not how it works today, I don't want to invent a way of doing that, and everyone else writes out a ton of boilerplate every time they use it too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh... Someday, when it's all in Swift...

Comment on lines -1 to -3
# Dependencies after compilation:
depends-top-level: [a]
provides-dynamic-lookup: [z]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must admit I'm chuckling at all the deletions! I hope that someday we do something to make sure we cover the same cases. EDIT: oh, sorry, I didn't see that you replaced these with the equivalent binaries! Thanks!!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, look at the commit history. One of the commits deletes a pile of input files which are actually not used at all.

@slavapestov
Copy link
Contributor Author

@davidungar If you're already familiar with the BitstreamRecord API, then FineGrainedDependencyGraphFormat.h sufficiently documents the file format. I'm not sure there's any point in writing up a detailed document. We also use bitstreams for AST/SIL serialization, and the 'serialized diagnostics' stuff. It's completely standard LLVM-style code.

@slavapestov
Copy link
Contributor Author

@swift-ci Please smoke test

1 similar comment
@slavapestov
Copy link
Contributor Author

@swift-ci Please smoke test

Copy link
Contributor

@davidungar davidungar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidungar If you're already familiar with the BitstreamRecord API, then FineGrainedDependencyGraphFormat.h sufficiently documents the file format. I'm not sure there's any point in writing up a detailed document. We also use bitstreams for AST/SIL serialization, and the 'serialized diagnostics' stuff. It's completely standard LLVM-style code.

@slavapestov I'm confused. Do you mean FineGrainedDependencyFormat.h? Where in that file is the documentation? Or do you mean to say that it's self-documenting code? If the latter, even a few sentences about which kind of blocks go where would help. There's also the issue that this code is serializing a graph, and there are considerations that are unique to this use of bitstreams. Apologies in advance if I'm overlooking some text somewhere.

Comment on lines 70 to 71
IdentifierIDField, // Dependency key mangled context type name
IdentifierIDField, // Dependency key basic name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my structure, the names were different (context vs name):

class DependencyKey {
// For import/export
friend ::llvm::yaml::MappingTraits;

NodeKind kind;
DeclAspect aspect;
/// The mangled context type name of the holder for \ref potentialMember, \ref
/// member, and \ref nominal kinds. Otherwise unused.
std::string context;
/// The basic name of the entity. Unused for \ref potentialMember and \ref
/// nominal kinds.
std::string name;

Thanks for adding the comment!

@slavapestov
Copy link
Contributor Author

@swift-ci Please smoke test Linux

@slavapestov slavapestov force-pushed the binary-swiftdeps-format branch 2 times, most recently from 8ee7cc6 to da3ed57 Compare June 11, 2020 01:20
@slavapestov
Copy link
Contributor Author

@swift-ci Please smoke test

@slavapestov
Copy link
Contributor Author

@davidungar I think FineGrainedDependencyFormat.h is pretty self-documenting, yeah. I added a couple more comments though. I don't think there is anything special here about the graph structure, because it's just a linear sequence of nodes. The only interesting part is that each node can have zero or more depends-on entries, but in your in-memory representation those are stored as an std::vector of integers, so I'm just serializing them as a sequence of integers too.

@slavapestov
Copy link
Contributor Author

@swift-ci Please smoke test Linux

@slavapestov
Copy link
Contributor Author

@swift-ci Please clean smoke test Linux

Copy link
Contributor

@davidungar davidungar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@slavapestov slavapestov force-pushed the binary-swiftdeps-format branch from da3ed57 to b873fe2 Compare June 11, 2020 03:44
@slavapestov
Copy link
Contributor Author

@swift-ci Please smoke test

1 similar comment
@slavapestov
Copy link
Contributor Author

@swift-ci Please smoke test

@slavapestov
Copy link
Contributor Author

@swift-ci Please smoke test Linux

@slavapestov slavapestov merged commit 965c735 into swiftlang:master Jun 11, 2020
@devesh-shetty
Copy link

@DougGregor, saw your tweet https://twitter.com/dgregor79/status/1285047908077670401?s=21.
I'd be down to port this to Swift.

CodaFi added a commit to CodaFi/swift that referenced this pull request Sep 25, 2020
Take advantage of the binary swiftdeps serialization utliities built during swiftlang#32131. Add a new optional information block to swiftdeps files. For now, don't actually serialize swiftdeps information.

Frontends will use this information to determine whether to write incremental dependencies across modules into their swiftdeps files. We will then teach the driver to deserialize the data from this section and integrate it into its incremental decision making.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants