New binary swiftdeps format #32131

slavapestov · 2020-06-02T04:49:41Z

It should be faster to read and write than YAML.

slavapestov · 2020-06-10T20:23:33Z

@swift-ci Please smoke test

slavapestov · 2020-06-10T21:48:27Z

@swift-ci Please smoke test

slavapestov · 2020-06-10T21:49:58Z

@swift-ci Please smoke test

slavapestov · 2020-06-10T22:35:46Z

@swift-ci Please smoke test macOS

slavapestov · 2020-06-10T22:45:27Z

@swift-ci Please smoke test

slavapestov · 2020-06-10T22:45:34Z

@swift-ci Please smoke test

slavapestov · 2020-06-10T22:45:43Z

@swift-ci Please smoke test

slavapestov · 2020-06-10T22:45:52Z

@swift-ci Please smoke test macOS

davidungar

This is really good stuff! Reading it this way, imagining someone maintaining it in a few years, I think it could really use some more explanation.

I don't know if it's reasonable to try to centralize the invariants across writer and reader, but it would be more robust if so.

The bitstream code is surprising hard to read, as an ignorant reader. Not your fault at all, but anything you can think of that would help or factor it would benefit a maintainer.

I'm going to request changes: the one change that I feel is most important would be an explanation of the overall design, or a pointer to such. The rest, I'm happy to leave to your judgement.

davidungar · 2020-06-10T22:29:16Z

include/swift/AST/FineGrainedDependencies.h

@@ -632,6 +632,11 @@ class DepGraphNode {

  const DependencyKey &getKey() const { return key; }

+  /// Only used when the driver is reading a SourceFileDepGraphNode.


Good comment! Sorry it cannot be enforced.

It can be enforced with a 'friend' declaration -- but then I'd have to expose an implementation detail of the reader code in a public header. Either way, one or the other has to be public! And just having a public setter here doesn't seem like the worst idea in the world, since other data types here have public setters.

Sounds like there's no way to enforce it.

When I'm debugging an unexpected value in a data type, the smaller the scope of any setters, the easier it is to find the cause. Why would the existence of public setters for other types matter?

davidungar · 2020-06-10T22:30:35Z

include/swift/AST/FineGrainedDependencies.h

@@ -684,15 +689,15 @@ class SourceFileDepGraphNode : public DepGraphNode {
  /// True iff a Decl exists for this node.
  /// If a provides and a depends in the existing system both have the same key,
  /// only one SourceFileDepGraphNode is emitted.
-  bool isProvides;
+  bool isProvides = false;


Reading the code, I wonder why false is the default. Would a comment explaining that be out of place?

Without this the no-argument constructor produces an object where 'isProvides' has an undefined value. I'm not sure that warrants a comment.

When I saw the "false" I figured that the isProvides field would only be set if it were true, and you were using the false because if never set, that meant it would be false. If such is the case, the code would be clearer to me with a comment.

If, on the other hand, it is the case that this field will be set to either truth value, and the initialization to false has no effect, then, the way I read code, I would find this initialization confusing without a comment. The initializer sets the sequence number to an illegal value so that initialized-but-never-set nodes can be detected.

davidungar · 2020-06-10T22:31:36Z

include/swift/AST/FineGrainedDependencies.h


  friend ::llvm::yaml::MappingContextTraits<SourceFileDepGraphNode,
                                            SourceFileDepGraph>;

 public:
  /// When the driver imports a node create an uninitialized instance for
  /// deserializing.
-  SourceFileDepGraphNode() : DepGraphNode(), sequenceNumber(~0) {}
+  SourceFileDepGraphNode() : DepGraphNode() {}


Why get rid of the illegal sequence number? It was an extra check.

It's already initialized to ~0 above, so this initializer was redundant.

OK, sounds good!

davidungar · 2020-06-10T22:32:58Z

include/swift/AST/FineGrainedDependencyFormat.h

+using llvm::BCBlob;
+using llvm::BCRecordLayout;
+
+const unsigned char FINE_GRAINED_DEPDENENCY_FORMAT_SIGNATURE[] = {'D', 'E', 'P', 'S'};


As a reviewer/maintainer, it would be nice to have a comment right here explaining the purpose of a signature.

I added a comment (it's the magic number that the swiftdeps file begins with).

Great! Thanks.

davidungar · 2020-06-10T22:33:25Z

include/swift/AST/FineGrainedDependencyFormat.h

+
+const unsigned char FINE_GRAINED_DEPDENENCY_FORMAT_SIGNATURE[] = {'D', 'E', 'P', 'S'};
+
+const unsigned FINE_GRAINED_DEPENDENCY_FORMAT_VERSION_MAJOR = 1;


Would it make sense for this to be version 2?

No, it's the version number of the new binary format, not the version number of the overall dependency format. (In which case, it would be 3, with the fine-grained format v2 :-) )

OK, I guess I was misled by the name "fine grained dependency format version", which implied to me that the number should be 2, since this is the second one such. But I can't think of anything that would break by leaving it as 1, since an old driver should reject this new format.

davidungar · 2020-06-10T22:53:48Z

lib/AST/FineGrainedDependencyFormat.cpp

+  bool readSignature();
+  bool enterTopLevelBlock();
+  bool readMetadata();


Do these return true on error or success?

Added a comment (true on failure)

davidungar · 2020-06-10T22:55:14Z

lib/AST/FineGrainedDependencyFormat.cpp

+    auto next = Cursor.advance();
+    if (!next) {
+      consumeError(next.takeError());
+      return true;
+    }
+
+    if (next->Kind != llvm::BitstreamEntry::SubBlock)
+      return true;
+
+    if (next->ID != llvm::bitc::BLOCKINFO_BLOCK_ID)
+      return true;
+
+    if (!Cursor.ReadBlockInfoBlock())
+      return true;
+  }
+
+  {
+    auto next = Cursor.advance();
+    if (!next) {
+      consumeError(next.takeError());
+      return true;
+    }
+
+    if (next->Kind != llvm::BitstreamEntry::SubBlock)
+      return true;
+
+    if (next->ID != RECORD_BLOCK_ID)
+      return true;
+
+    if (auto err = Cursor.EnterSubBlock(RECORD_BLOCK_ID)) {
+      consumeError(std::move(err));
+      return true;
+    }


Feels a bit long to grasp without either breaking it up or comments.

Added a comment

davidungar · 2020-06-10T22:57:02Z

lib/AST/FineGrainedDependencyFormat.cpp

+  using namespace record_block;
+
+  auto entry = Cursor.advance();
+  if (!entry) {
+    consumeError(entry.takeError());
+    return true;
+  }
+
+  if (entry->Kind != llvm::BitstreamEntry::Record)
+    return true;
+
+  auto recordID = Cursor.readRecord(entry->ID, Scratch, &BlobData);
+  if (!recordID) {
+    consumeError(recordID.takeError());
+    return true;
+  }
+
+  if (*recordID != METADATA)
+    return true;
+
+  unsigned majorVersion, minorVersion;
+
+  MetadataLayout::readRecord(Scratch, majorVersion, minorVersion);
+  if (majorVersion != FINE_GRAINED_DEPENDENCY_FORMAT_VERSION_MAJOR ||
+      minorVersion != FINE_GRAINED_DEPENDENCY_FORMAT_VERSION_MINOR) {
+    return true;
+  }
+
+  return false;
+}


Is there some pattern going on in these? Can it be abstracted? So far both this function and the prior one start out the same way, then do similar-seeming validations.

Even better, would be if invariants could be centralized between reader & writer, though that may be impractical.

I don't think it's worth trying to abstract this out. It's boilerplate for the LLVM bitstream record API, which is pretty general, and we're using it in one specific way.

davidungar · 2020-06-10T23:00:55Z

lib/AST/FineGrainedDependencyFormat.cpp

+  using namespace record_block;
+
+  if (readSignature())
+    return true;
+
+  if (enterTopLevelBlock())
+    return true;
+
+  if (readMetadata())
+    return true;
+
+  SourceFileDepGraphNode *node = nullptr;
+  size_t sequenceNumber = 0;
+
+  while (!Cursor.AtEndOfStream()) {
+    auto entry = cantFail(Cursor.advance(), "Advance bitstream cursor");
+
+    if (entry.Kind == llvm::BitstreamEntry::EndBlock) {
+      Cursor.ReadBlockEnd();
+      assert(Cursor.GetCurrentBitNo() % CHAR_BIT == 0);
+      break;
+    }
+
+    if (entry.Kind != llvm::BitstreamEntry::Record)
+      llvm::report_fatal_error("Bad bitstream entry kind");
+
+    Scratch.clear();
+    unsigned recordID = cantFail(
+      Cursor.readRecord(entry.ID, Scratch, &BlobData),
+      "Read bitstream record");
+
+    switch (recordID) {
+    case METADATA: {
+      // METADATA must appear at the beginning and is handled by readMetadata().
+      llvm::report_fatal_error("Unexpected METADATA record");
+      break;
+    }
+
+    case SOURCE_FILE_DEP_GRAPH_NODE: {
+      unsigned nodeKindID, declAspectID, contextID, nameID, isProvides;
+      SourceFileDepGraphNodeLayout::readRecord(Scratch, nodeKindID, declAspectID,
+                                               contextID, nameID, isProvides);
+      node = new SourceFileDepGraphNode();
+      node->setSequenceNumber(sequenceNumber++);
+      g.addNode(node);
+
+      auto nodeKind = getNodeKind(nodeKindID);
+      if (!nodeKind)
+        llvm::report_fatal_error("Bad node kind");
+      auto declAspect = getDeclAspect(declAspectID);
+      if (!declAspect)
+        llvm::report_fatal_error("Bad decl aspect");
+      auto context = getIdentifier(contextID);
+      if (!context)
+        llvm::report_fatal_error("Bad context");
+      auto name = getIdentifier(nameID);
+      if (!name)
+        llvm::report_fatal_error("Bad identifier");
+
+      node->setKey(DependencyKey(*nodeKind, *declAspect, *context, *name));
+      if (isProvides)
+        node->setIsProvides();
+      break;
+    }
+
+    case FINGERPRINT_NODE: {
+      // FINGERPRINT_NODE must follow a SOURCE_FILE_DEP_GRAPH_NODE.
+      if (node == nullptr)
+        llvm::report_fatal_error("Unexpected FINGERPRINT_NODE record");
+
+      node->setFingerprint(BlobData);
+      break;
+    }
+
+    case DEPENDS_ON_DEFINITION_NODE: {
+      // DEPENDS_ON_DEFINITION_NODE must follow a SOURCE_FILE_DEP_GRAPH_NODE.
+      if (node == nullptr)
+        llvm::report_fatal_error("Unexpected DEPENDS_ON_DEFINITION_NODE record");
+
+      unsigned dependsOnDefID;
+      DependsOnDefNodeLayout::readRecord(Scratch, dependsOnDefID);


(Meant to select this whole function, but too long)
Not your fault at all, but this bitstream stuff reads like COBOL--so verbose!
I'm skipping the rest of the file.

Sure, it would be nice if, eg it could generate the reader and writer from a single declarative description. But that's not how it works today, I don't want to invent a way of doing that, and everyone else writes out a ton of boilerplate every time they use it too.

Sigh... Someday, when it's all in Swift...

davidungar · 2020-06-10T23:03:44Z

test/Driver/Dependencies/Inputs/chained-additional-kinds/main.swift

-# Dependencies after compilation:
-depends-top-level: [a]
-provides-dynamic-lookup: [z]


Must admit I'm chuckling at all the deletions! I hope that someday we do something to make sure we cover the same cases. EDIT: oh, sorry, I didn't see that you replaced these with the equivalent binaries! Thanks!!

No, look at the commit history. One of the commits deletes a pile of input files which are actually not used at all.

slavapestov · 2020-06-10T23:55:31Z

@davidungar If you're already familiar with the BitstreamRecord API, then FineGrainedDependencyGraphFormat.h sufficiently documents the file format. I'm not sure there's any point in writing up a detailed document. We also use bitstreams for AST/SIL serialization, and the 'serialized diagnostics' stuff. It's completely standard LLVM-style code.

slavapestov · 2020-06-10T23:56:43Z

@swift-ci Please smoke test

slavapestov · 2020-06-10T23:57:38Z

@swift-ci Please smoke test

davidungar

@davidungar If you're already familiar with the BitstreamRecord API, then FineGrainedDependencyGraphFormat.h sufficiently documents the file format. I'm not sure there's any point in writing up a detailed document. We also use bitstreams for AST/SIL serialization, and the 'serialized diagnostics' stuff. It's completely standard LLVM-style code.

@slavapestov I'm confused. Do you mean FineGrainedDependencyFormat.h? Where in that file is the documentation? Or do you mean to say that it's self-documenting code? If the latter, even a few sentences about which kind of blocks go where would help. There's also the issue that this code is serializing a graph, and there are considerations that are unique to this use of bitstreams. Apologies in advance if I'm overlooking some text somewhere.

davidungar · 2020-06-11T00:30:01Z

include/swift/AST/FineGrainedDependencyFormat.h

+    IdentifierIDField, // Dependency key mangled context type name
+    IdentifierIDField, // Dependency key basic name


In my structure, the names were different (context vs name):

class DependencyKey {
// For import/export
friend ::llvm::yaml::MappingTraits;

NodeKind kind;
DeclAspect aspect;
/// The mangled context type name of the holder for \ref potentialMember, \ref
/// member, and \ref nominal kinds. Otherwise unused.
std::string context;
/// The basic name of the entity. Unused for \ref potentialMember and \ref
/// nominal kinds.
std::string name;

Thanks for adding the comment!

slavapestov · 2020-06-11T01:14:11Z

@swift-ci Please smoke test Linux

slavapestov · 2020-06-11T01:20:42Z

@swift-ci Please smoke test

slavapestov · 2020-06-11T01:22:04Z

@davidungar I think FineGrainedDependencyFormat.h is pretty self-documenting, yeah. I added a couple more comments though. I don't think there is anything special here about the graph structure, because it's just a linear sequence of nodes. The only interesting part is that each node can have zero or more depends-on entries, but in your in-memory representation those are stored as an std::vector of integers, so I'm just serializing them as a sequence of integers too.

slavapestov · 2020-06-11T01:22:12Z

@swift-ci Please smoke test Linux

slavapestov · 2020-06-11T02:33:12Z

@swift-ci Please clean smoke test Linux

davidungar

Great!

… YAML formats

slavapestov · 2020-06-11T03:44:04Z

@swift-ci Please smoke test

slavapestov · 2020-06-11T03:44:44Z

@swift-ci Please smoke test

slavapestov · 2020-06-11T05:46:06Z

@swift-ci Please smoke test Linux

devesh-shetty · 2020-07-20T03:27:39Z

@DougGregor, saw your tweet https://twitter.com/dgregor79/status/1285047908077670401?s=21.
I'd be down to port this to Swift.

Take advantage of the binary swiftdeps serialization utliities built during swiftlang#32131. Add a new optional information block to swiftdeps files. For now, don't actually serialize swiftdeps information. Frontends will use this information to determine whether to write incremental dependencies across modules into their swiftdeps files. We will then teach the driver to deserialize the data from this section and integrate it into its incremental decision making.

slavapestov force-pushed the binary-swiftdeps-format branch 6 times, most recently from 9fe1c9f to d75b4df Compare June 10, 2020 19:36

slavapestov marked this pull request as ready for review June 10, 2020 19:37

slavapestov added 4 commits June 10, 2020 15:40

Dependencies: Small cleanup in FineGrainedDependencies.{h,cpp}

6730270

Dependencies: New binary format for fine-grained dependency graph

d59a76c

Dependencies: Fix uninitialized memory in SourceFileDepGraphNode

82c0897

Dependencies: Implement SourceFileDepGraphNode::dump()

4ff62ec

slavapestov force-pushed the binary-swiftdeps-format branch from d75b4df to b07a489 Compare June 10, 2020 20:20

slavapestov force-pushed the binary-swiftdeps-format branch from b07a489 to e25c9e3 Compare June 10, 2020 21:47

slavapestov force-pushed the binary-swiftdeps-format branch from e25c9e3 to 52b0439 Compare June 10, 2020 22:45

davidungar suggested changes Jun 10, 2020

View reviewed changes

davidungar suggested changes Jun 11, 2020

View reviewed changes

slavapestov force-pushed the binary-swiftdeps-format branch 2 times, most recently from 8ee7cc6 to da3ed57 Compare June 11, 2020 01:20

davidungar approved these changes Jun 11, 2020

View reviewed changes

slavapestov added 4 commits June 10, 2020 23:43

Dependencies: New swift-dependency-tool to convert between binary and…

4ff5dd4

… YAML formats

Dependencies: Update tests for new binary format

fe3f6b3

Dependencies: Remove apparently-unused test inputs

53931c0

Code review feedback from Dave Ungar

b873fe2

slavapestov force-pushed the binary-swiftdeps-format branch from da3ed57 to b873fe2 Compare June 11, 2020 03:44

slavapestov merged commit 965c735 into swiftlang:master Jun 11, 2020

		@@ -632,6 +632,11 @@ class DepGraphNode {

		const DependencyKey &getKey() const { return key; }

		/// Only used when the driver is reading a SourceFileDepGraphNode.


		const unsigned char FINE_GRAINED_DEPDENENCY_FORMAT_SIGNATURE[] = {'D', 'E', 'P', 'S'};

		const unsigned FINE_GRAINED_DEPENDENCY_FORMAT_VERSION_MAJOR = 1;

		IdentifierIDField, // Dependency key mangled context type name
		IdentifierIDField, // Dependency key basic name

New binary swiftdeps format #32131

New binary swiftdeps format #32131

Uh oh!

Conversation

slavapestov commented Jun 2, 2020

Uh oh!

slavapestov commented Jun 10, 2020

Uh oh!

slavapestov commented Jun 10, 2020

Uh oh!

slavapestov commented Jun 10, 2020

Uh oh!

slavapestov commented Jun 10, 2020

Uh oh!

slavapestov commented Jun 10, 2020

Uh oh!

slavapestov commented Jun 10, 2020

Uh oh!

slavapestov commented Jun 10, 2020

Uh oh!

slavapestov commented Jun 10, 2020

Uh oh!

davidungar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

slavapestov commented Jun 10, 2020