Skip to content

Commit 95c3930

Browse files
authored
Add chapter on libs and metadata. (#1044)
1 parent be872c1 commit 95c3930

File tree

2 files changed

+193
-0
lines changed

2 files changed

+193
-0
lines changed

src/SUMMARY.md

+1
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,7 @@
145145
- [Debugging LLVM](./backend/debugging.md)
146146
- [Backend Agnostic Codegen](./backend/backend-agnostic.md)
147147
- [Implicit Caller Location](./backend/implicit-caller-location.md)
148+
- [Libraries and Metadata](./backend/libs-and-metadata.md)
148149
- [Profile-guided Optimization](./profile-guided-optimization.md)
149150
- [LLVM Source-Based Code Coverage](./llvm-coverage-instrumentation.md)
150151
- [Sanitizers Support](./sanitizers.md)

src/backend/libs-and-metadata.md

+192
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
# Libraries and Metadata
2+
3+
When the compiler sees a reference to an external crate, it needs to load some
4+
information about that crate. This chapter gives an overview of that process,
5+
and the supported file formats for crate libraries.
6+
7+
## Libraries
8+
9+
A crate dependency can be loaded from an `rlib`, `dylib`, or `rmeta` file. A
10+
key point of these file formats is that they contain `rustc`-specific
11+
[*metadata*](#metadata). This metadata allows the compiler to discover enough
12+
information about the external crate to understand the items it contains,
13+
which macros it exports, and *much* more.
14+
15+
### rlib
16+
17+
An `rlib` is an [archive file], which is similar to a tar file. This file
18+
format is specific to `rustc`, and may change over time. This file contains:
19+
20+
* Object code, which is the result of code generation. This is used during
21+
regular linking. There is a separate `.o` file for each [codegen unit]. The
22+
codegen step can be skipped with the [`-C
23+
linker-plugin-lto`][linker-plugin-lto] CLI option, which means each `.o`
24+
file will only contain LLVM bitcode.
25+
* [LLVM bitcode], which is a binary representation of LLVM's intermediate
26+
representation, which is embedded as a section in the `.o` files. This can
27+
be used for [Link Time Optimization] (LTO). This can be removed with the
28+
[`-C embed-bitcode=no`][embed-bitcode] CLI option to improve compile times
29+
and reduce disk space if LTO is not needed.
30+
* `rustc` [metadata], in a file named `lib.rmeta`.
31+
* A symbol table, which is generally a list of symbols with offsets to the
32+
object file that contain that symbol. This is pretty standard for archive
33+
files.
34+
35+
[archive file]: https://en.wikipedia.org/wiki/Ar_(Unix)
36+
[LLVM bitcode]: https://llvm.org/docs/BitCodeFormat.html
37+
[Link Time Optimization]: https://llvm.org/docs/LinkTimeOptimization.html
38+
[codegen unit]: ../backend/codegen.md
39+
[embed-bitcode]: https://doc.rust-lang.org/rustc/codegen-options/index.html#embed-bitcode
40+
[linker-plugin-lto]: https://doc.rust-lang.org/rustc/codegen-options/index.html#linker-plugin-lto
41+
42+
### dylib
43+
44+
A `dylib` is a platform-specific shared library. It includes the `rustc`
45+
[metadata] in a special link section called `.rustc` in a compressed format.
46+
47+
### rmeta
48+
49+
An `rmeta` file is custom binary format that contains the [metadata] for the
50+
crate. This file can be used for fast "checks" of a project by skipping all
51+
code generation (as is done with `cargo check`), collecting enough information
52+
for documentation (as is done with `cargo doc`), or for
53+
[pipelining](#pipelining). This file is created if the
54+
[`--emit=metadata`][emit] CLI option is used.
55+
56+
`rmeta` files do not support linking, since they do not contain compiled
57+
object files.
58+
59+
[emit]: https://doc.rust-lang.org/rustc/command-line-arguments.html#option-emit
60+
61+
## Metadata
62+
63+
The metadata contains a wide swath of different elements. This guide will not
64+
go into detail of every field it contains. You are encouraged to browse the
65+
[`CrateRoot`] definition to get a sense of the different elements it contains.
66+
Everything about metadata encoding and decoding is in the [`rustc_metadata`]
67+
package.
68+
69+
Here are a few highlights of things it contains:
70+
71+
* The version of the `rustc` compiler. The compiler will refuse to load files
72+
from any other version.
73+
* The [Strict Version Hash](#strict-version-hash) (SVH). This helps ensure the
74+
correct dependency is loaded.
75+
* The [Crate Disambiguator](#crate-disambiguator). This is a hash used
76+
to disambiguate between different crates of the same name.
77+
* Information about all the source files in the library. This can be used for
78+
a variety of things, such as diagnostics pointing to sources in a
79+
dependency.
80+
* Information about exported macros, traits, types, and items. Generally,
81+
anything that's needed to be known when a path references something inside a
82+
crate dependency.
83+
* Encoded [MIR]. This is optional, and only encoded if needed for code
84+
generation. `cargo check` skips this for performance reasons.
85+
86+
[`CrateRoot`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/rmeta/struct.CrateRoot.html
87+
[`rustc_metadata`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/index.html
88+
[MIR]: ../mir/index.md
89+
90+
### Strict Version Hash
91+
92+
The Strict Version Hash ([SVH], also known as the "crate hash") is a 64-bit
93+
hash that is used to ensure that the correct crate dependencies are loaded. It
94+
is possible for a directory to contain multiple copies of the same dependency
95+
built with different settings, or built from different sources. The crate
96+
loader will skip any crates that have the wrong SVH.
97+
98+
The SVH is also used for the [incremental compilation] session filename,
99+
though that usage is mostly historic.
100+
101+
The hash includes a variety of elements:
102+
103+
* Hashes of the HIR nodes.
104+
* All of the upstream crate hashes.
105+
* All of the source filenames.
106+
* Hashes of certain command-line flags (like `-C metadata` via the [Crate
107+
Disambiguator](#crate-disambiguator), and all CLI options marked with
108+
`[TRACKED]`).
109+
110+
See [`finalize_and_compute_crate_hash`] for where the hash is actually
111+
computed.
112+
113+
[SVH]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/svh/struct.Svh.html
114+
[incremental compilation]: ../queries/incremental-compilation.md
115+
[`finalize_and_compute_crate_hash`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/hir/map/collector/struct.NodeCollector.html#method.finalize_and_compute_crate_hash
116+
117+
### Crate Disambiguator
118+
119+
The [`CrateDisambiguator`] is a 128-bit hash used to distinguish between
120+
different crates of the same name. It is a hash of all the [`-C metadata`] CLI
121+
options computed in [`compute_crate_disambiguator`]. It is used in a variety
122+
of places, such as symbol name mangling, crate loading, and much more.
123+
124+
By default, all Rust symbols are mangled and incorporate the disambiguator
125+
hash. This allows multiple versions of the same crate to be included together.
126+
Cargo automatically generates `-C metadata` hashes based on a variety of
127+
factors, like the package version, source, and the target kind (a lib and bin
128+
can have the same crate name, so they need to be disambiguated).
129+
130+
[`CrateDisambiguator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/crate_disambiguator/struct.CrateDisambiguator.html
131+
[`compute_crate_disambiguator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_interface/util/fn.compute_crate_disambiguator.html
132+
[`-C metadata`]: https://doc.rust-lang.org/rustc/codegen-options/index.html#metadata
133+
134+
## Crate loading
135+
136+
Crate loading can have quite a few subtle complexities. During [name
137+
resolution], when an external crate is referenced (via an `extern crate` or
138+
path), the resolver uses the [`CrateLoader`] which is responsible for finding
139+
the crate libraries and loading the [metadata] for them. After the dependency
140+
is loaded, the `CrateLoader` will provide the information the resolver needs
141+
to perform its job (such as expanding macros, resolving paths, etc.).
142+
143+
To load each external crate, the `CrateLoader` uses a [`CrateLocator`] to
144+
actually find the correct files for one specific crate. There is some great
145+
documentation in the [`locator`] module that goes into detail on how loading
146+
works, and I strongly suggest reading it to get the full picture.
147+
148+
The location of a dependency can come from several different places. Direct
149+
dependencies are usually passed with `--extern` flags, and the loader can look
150+
at those directly. Direct dependencies often have references to their own
151+
dependencies, which need to be loaded, too. These are usually found by
152+
scanning the directories passed with the `-L` flag for any file whose metadata
153+
contains a matching crate name and [SVH](#strict-version-hash). The loader
154+
will also look at the [sysroot] to find dependencies.
155+
156+
As crates are loaded, they are kept in the [`CStore`] with the crate metadata
157+
wrapped in the [`CrateMetadata`] struct. After resolution and expansion, the
158+
`CStore` will make its way into the [`GlobalCtxt`] for the rest of
159+
compilation.
160+
161+
[name resolution]: ../name-resolution.md
162+
[`CrateLoader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/creader/struct.CrateLoader.html
163+
[`CrateLocator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/locator/struct.CrateLocator.html
164+
[`locator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/locator/index.html
165+
[`CStore`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/creader/struct.CStore.html
166+
[`CrateMetadata`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/rmeta/decoder/struct.CrateMetadata.html
167+
[`GlobalCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.GlobalCtxt.html
168+
[sysroot]: ../building/bootstrapping.md#what-is-a-sysroot
169+
170+
## Pipelining
171+
172+
One trick to improve compile times is to start building a crate as soon as the
173+
metadata for its dependencies is available. For a library, there is no need to
174+
wait for the code generation of dependencies to finish. Cargo implements this
175+
technique by telling `rustc` to emit an [`rmeta`](#rmeta) file for each
176+
dependency as well as an [`rlib`](#rlib). As early as it can, `rustc` will
177+
save the `rmeta` file to disk before it continues to the code generation
178+
phase. The compiler sends a JSON message to let the build tool know that it
179+
can start building the next crate if possible.
180+
181+
The [crate loading](#crate-loading) system is smart enough to know when it
182+
sees an `rmeta` file to use that if the `rlib` is not there (or has only been
183+
partially written).
184+
185+
This pipelining isn't possible for binaries, because the linking phase will
186+
require the code generation of all its dependencies. In the future, it may be
187+
possible to further improve this scenario by splitting linking into a separate
188+
command (see [#64191]).
189+
190+
[#64191]: https://github.com/rust-lang/rust/issues/64191
191+
192+
[metadata]: #metadata

0 commit comments

Comments
 (0)