From b94159a5570a22739ccae6cacecebbfdbbb4cb6a Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Fri, 29 Jan 2021 10:36:27 -0800 Subject: [PATCH 1/3] Add chapter on libs and metadata. --- src/SUMMARY.md | 1 + src/backend/libs-and-metadata.md | 192 +++++++++++++++++++++++++++++++ 2 files changed, 193 insertions(+) create mode 100644 src/backend/libs-and-metadata.md diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 392b3814e..02e8e3843 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -145,6 +145,7 @@ - [Debugging LLVM](./backend/debugging.md) - [Backend Agnostic Codegen](./backend/backend-agnostic.md) - [Implicit Caller Location](./backend/implicit-caller-location.md) +- [Libraries and Metadata](./backend/libs-and-metadata.md) - [Profile-guided Optimization](./profile-guided-optimization.md) - [LLVM Source-Based Code Coverage](./llvm-coverage-instrumentation.md) - [Sanitizers Support](./sanitizers.md) diff --git a/src/backend/libs-and-metadata.md b/src/backend/libs-and-metadata.md new file mode 100644 index 000000000..3a9b76898 --- /dev/null +++ b/src/backend/libs-and-metadata.md @@ -0,0 +1,192 @@ +# Libraries and Metadata + +When the compiler sees a reference to an extern crate, it needs to load some +information about that crate. This chapter gives an overview of that process, +and the supported file formats for crate libraries. + +## Libraries + +A crate dependency can be loaded from an `rlib`, `dylib`, or `rmeta` file. A +key point of these file formats is that they contain `rustc`-specific +[*metadata*](#metadata). This metadata allows the compiler to discover enough +information about the extern crate to understand the items it contains, which +macros it exports, and *much* more. + +### rlib + +An `rlib` is an [archive file], which is similar to a tar file. This file +format is specific to `rustc`, and may change over time. This file contains: + +* Object code, which is the result of code generation. This is used during + regular linking. There is a separate `.o` file for each [codegen unit]. The + codegen step can be skipped with the [`-C + linker-plugin-lto`][linker-plugin-lto] CLI option, which means each `.o` + file will only contain LLVM bitcode, which can be used when using LTO. +* [LLVM bitcode], which is a binary representation of LLVM's intermediate + representation, which is embedded as a section in the `.o` files. This can + be used for [Link Time Optimization] (LTO). This can be removed with the + [`-C embed-bitcode=no`][embed-bitcode] CLI option to improve compile times + and reduce disk space if LTO is not needed. +* `rustc` [metadata], in a file named `lib.rmeta`. +* A symbol table, which is generally a list of symbols with offsets to the + object file that contain that symbol. This is pretty standard for archive + files. + +[archive file]: https://en.wikipedia.org/wiki/Ar_(Unix) +[LLVM bitcode]: https://llvm.org/docs/BitCodeFormat.html +[Link Time Optimization]: https://llvm.org/docs/LinkTimeOptimization.html +[codegen unit]: ../backend/codegen.md +[embed-bitcode]: https://doc.rust-lang.org/rustc/codegen-options/index.html#embed-bitcode +[linker-plugin-lto]: https://doc.rust-lang.org/rustc/codegen-options/index.html#linker-plugin-lto + +### dylib + +A `dylib` is a platform-specific shared library. It includes the `rustc` +[metadata] in a special section called `.rustc` in a compressed format. + +### rmeta + +An `rmeta` file is custom binary format that contains the [metadata] for the +crate. This file can be used for fast "checks" of a project by skipping all +code generation (as is done with `cargo check`), collecting enough information +for documentation (as is done with `cargo doc`), or for +[pipelining](#pipelining). This file is created if the +[`--emit=metadata`][emit] CLI option is used. + +`rmeta` files do not support linking, since they do not contain compiled +object files. + +[emit]: https://doc.rust-lang.org/rustc/command-line-arguments.html#option-emit + +## Metadata + +The metadata contains a wide swath of different elements. This guide will not +go into detail of every field it contains. You are encouraged to browse the +[`CrateRoot`] definition to get a sense of the different elements it contains. +Everything about metadata encoding and decoding is in the [`rustc_metadata`] +package. + +Here are a few highlights of things it contains: + +* The version of the `rustc` compiler. The compiler will refuse to load files + from any other version. +* The [Strict Version Hash](#strict-version-hash) (SVH). This helps ensure the + correct dependency is loaded. +* The [Crate Disambiguator](#crate-disambiguator). This is a hash used for + various things to disambiguate between different crates of the same name. +* Information about all the source files in the library. This can be used for + a variety of things, such as diagnostics pointing to sources in a + dependency. +* Information about exported macros, traits, types, and items. Generally, + anything that's needed to be known when a path references something inside a + crate dependency. +* Encoded [MIR]. This is optional, and only encoded if needed for code + generation. `cargo check` skips this for performance reasons. + +[`CrateRoot`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/rmeta/struct.CrateRoot.html +[`rustc_metadata`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/index.html +[MIR]: ../mir/index.md + +### Strict Version Hash + +The Strict Version Hash ([SVH], also known as the "crate hash") is a 64-bit +hash that is used to ensure that the correct crate dependencies are loaded. It +is possible for a directory to contain multiple copies of the same dependency +built with different settings, or built from different sources. The crate +loader will skip any crates that have the wrong SVH. + +The SVH is also used for the [incremental compilation] session filename, +though that usage is mostly historic. + +The hash includes a variety of elements: + +* Hashes of the HIR nodes. +* All of the upstream crate hashes. +* All of the source filenames. +* Hashes of certain command-line flags (like `-C metadata` via the [Crate + Disambiguator](#crate-disambiguator), and all CLI options marked with + `[TRACKED]`). + +See [`finalize_and_compute_crate_hash`] for where the hash is actually +computed. + +[SVH]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/svh/struct.Svh.html +[incremental compilation]: ../queries/incremental-compilation.md +[`finalize_and_compute_crate_hash`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/hir/map/collector/struct.NodeCollector.html#method.finalize_and_compute_crate_hash + +### Crate Disambiguator + +The [`CrateDisambiguator`] is a 128-bit hash used to distinguish between +different crates of the same name. It is a hash of all the [`-C metadata`] CLI +options computed in [`compute_crate_disambiguator`]. It is used in a variety +of places, such as symbol name mangling, crate loading, and much more. + +By default, all Rust symbols are mangled and incorporate the disambiguator +hash. This allows multiple versions of the same crate to be included together. +Cargo automatically generates `-C metadata` hashes based on a variety of +factors, like the package version, source, and the target kind (a lib and bin +can have the same crate name, so they need to be disambiguated). + +[`CrateDisambiguator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/crate_disambiguator/struct.CrateDisambiguator.html +[`compute_crate_disambiguator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_interface/util/fn.compute_crate_disambiguator.html +[`-C metadata`]: https://doc.rust-lang.org/rustc/codegen-options/index.html#metadata + +## Crate loading + +Crate loading can have quite a few subtle complexities. During [name +resolution], when an extern crate is referenced (via an `extern crate` or +path), the resolver uses the [`CrateLoader`] which is responsible for finding +the crate libraries and loading the [metadata] for them. After the dependency +is loaded, the `CrateLoader` will provide the information the resolver needs +to perform its job (such as expanding macros, resolving paths, etc.). + +To load each extern crate, the `CrateLoader` uses a [`CrateLocator`] to +actually find the correct files for one specific crate. There is some great +documentation in the [`locator`] module that goes into detail on how loading +works, and I strongly suggest reading it to get the full picture. + +The location of a dependency can come from several different places. Direct +dependencies are usually passed with `--extern` flags, and the loader can look +at those directly. Direct dependencies often have references to their own +dependencies, which need to be loaded, too. These are usually found by +scanning the directories passed with the `-L` flag for any file whose metadata +contains a matching crate name and [SVH](#strict-version-hash). The loader +will also look at the [sysroot] to find dependencies. + +As crates are loaded, they are kept in the [`CStore`] with the crate metadata +wrapped in the [`CrateMetadata`] struct. After resolution and expansion, the +`CStore` will make its way into the [`GlobalCtxt`] for the rest of +compilation. + +[name resolution]: ../name-resolution.md +[`CrateLoader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/creader/struct.CrateLoader.html +[`CrateLocator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/locator/struct.CrateLocator.html +[`locator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/locator/index.html +[`CStore`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/creader/struct.CStore.html +[`CrateMetadata`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/rmeta/decoder/struct.CrateMetadata.html +[`GlobalCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.GlobalCtxt.html +[sysroot]: ../building/bootstrapping.md#what-is-a-sysroot + +## Pipelining + +One trick to improve compile times is to start building a crate as soon as the +metadata for its dependencies is available. For a library, there is no need to +wait for the code generation of dependencies to finish. Cargo implements this +technique by telling `rustc` to emit an [`rmeta`](#rmeta) file for each +dependency as well as an [`rlib`](#rlib). As early as it can, `rustc` will +save the `rmeta` file to disk before it continues to the code generation +phase. The compiler sends a JSON message to let the build tool know that it +can start building the next crate if possible. + +The [crate loading](#crate-loading) system is smart enough to know when it +sees an `rmeta` file to use that if the `rlib` is not there (or has only been +partially written). + +This pipelining isn't possible for binaries, because the linking phase will +require the code generation of all its dependencies. In the future, it may be +possible to further improve this scenario by splitting linking into a separate +command (see [#64191]). + +[#64191]: https://github.com/rust-lang/rust/issues/64191 + +[metadata]: #metadata From 22d52475f628688ad1a97164a1e8b224ce63c682 Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Fri, 29 Jan 2021 13:01:46 -0800 Subject: [PATCH 2/3] extern -> external --- src/backend/libs-and-metadata.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/backend/libs-and-metadata.md b/src/backend/libs-and-metadata.md index 3a9b76898..7a6a90d31 100644 --- a/src/backend/libs-and-metadata.md +++ b/src/backend/libs-and-metadata.md @@ -1,6 +1,6 @@ # Libraries and Metadata -When the compiler sees a reference to an extern crate, it needs to load some +When the compiler sees a reference to an external crate, it needs to load some information about that crate. This chapter gives an overview of that process, and the supported file formats for crate libraries. @@ -9,8 +9,8 @@ and the supported file formats for crate libraries. A crate dependency can be loaded from an `rlib`, `dylib`, or `rmeta` file. A key point of these file formats is that they contain `rustc`-specific [*metadata*](#metadata). This metadata allows the compiler to discover enough -information about the extern crate to understand the items it contains, which -macros it exports, and *much* more. +information about the external crate to understand the items it contains, +which macros it exports, and *much* more. ### rlib @@ -134,13 +134,13 @@ can have the same crate name, so they need to be disambiguated). ## Crate loading Crate loading can have quite a few subtle complexities. During [name -resolution], when an extern crate is referenced (via an `extern crate` or +resolution], when an external crate is referenced (via an `extern crate` or path), the resolver uses the [`CrateLoader`] which is responsible for finding the crate libraries and loading the [metadata] for them. After the dependency is loaded, the `CrateLoader` will provide the information the resolver needs to perform its job (such as expanding macros, resolving paths, etc.). -To load each extern crate, the `CrateLoader` uses a [`CrateLocator`] to +To load each external crate, the `CrateLoader` uses a [`CrateLocator`] to actually find the correct files for one specific crate. There is some great documentation in the [`locator`] module that goes into detail on how loading works, and I strongly suggest reading it to get the full picture. From a2ee37ba80d8e538111f4b31fd849e093f20a9ff Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Thu, 4 Feb 2021 08:13:28 -0800 Subject: [PATCH 3/3] Updates from review. --- src/backend/libs-and-metadata.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/backend/libs-and-metadata.md b/src/backend/libs-and-metadata.md index 7a6a90d31..91d8cb3aa 100644 --- a/src/backend/libs-and-metadata.md +++ b/src/backend/libs-and-metadata.md @@ -21,7 +21,7 @@ format is specific to `rustc`, and may change over time. This file contains: regular linking. There is a separate `.o` file for each [codegen unit]. The codegen step can be skipped with the [`-C linker-plugin-lto`][linker-plugin-lto] CLI option, which means each `.o` - file will only contain LLVM bitcode, which can be used when using LTO. + file will only contain LLVM bitcode. * [LLVM bitcode], which is a binary representation of LLVM's intermediate representation, which is embedded as a section in the `.o` files. This can be used for [Link Time Optimization] (LTO). This can be removed with the @@ -42,7 +42,7 @@ format is specific to `rustc`, and may change over time. This file contains: ### dylib A `dylib` is a platform-specific shared library. It includes the `rustc` -[metadata] in a special section called `.rustc` in a compressed format. +[metadata] in a special link section called `.rustc` in a compressed format. ### rmeta @@ -72,8 +72,8 @@ Here are a few highlights of things it contains: from any other version. * The [Strict Version Hash](#strict-version-hash) (SVH). This helps ensure the correct dependency is loaded. -* The [Crate Disambiguator](#crate-disambiguator). This is a hash used for - various things to disambiguate between different crates of the same name. +* The [Crate Disambiguator](#crate-disambiguator). This is a hash used + to disambiguate between different crates of the same name. * Information about all the source files in the library. This can be used for a variety of things, such as diagnostics pointing to sources in a dependency.