Skip to content

Commit bbbf843

Browse files
committed
Update "Producing Consistent Snapshots"
Following discussions with @dstufft and @trishankatdatadog regarding file uploads and simple index generation on PyPI (see secure-systems-lab#70) this commit once more refines the "producing consistent snapshots" section. It includes the following changes: - Remove the notion of *transaction processes* and instead talk about *uploads*. Background: Transaction processes are only relevant if multiple files of a project release need to be handled in a single transaction, which is not the case on PyPI, where each upload of a distribution file is self-contained. With this change, upload process just place files into a queue, without updating bin-n metadata (as transaction processes would have done in parallel), and all the metadata update/creation work is done by the snapshot process in strictly sequential manner. - Add a paragraph about simple index pages and how their hashes should be included in *bin-n* metadata, and how they need to remain stable if re-generated dynamically.
1 parent b5d8c83 commit bbbf843

File tree

1 file changed

+37
-41
lines changed

1 file changed

+37
-41
lines changed

pep-0458.txt

Lines changed: 37 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -907,55 +907,51 @@ efficiently transfer consistent snapshots from PyPI.
907907
Producing Consistent Snapshots
908908
------------------------------
909909

910-
When a new project release is uploaded to PyPI, PyPI MUST update the *bin-n*
911-
metadata responsible for the target files of the project release. Remember that
912-
target files are sorted into bins by their filename hashes. Consequentially,
913-
PyPI MUST update *snapshot* to account for the updated *bin-n* metadata, and
914-
*timestamp* to account for the updated *snapshot* metadata. These updates
915-
SHOULD be handled by automated processes, e.g. one or more *transaction
916-
processes* and one *snapshot process*.
917-
918-
Each transaction process keeps track of a project upload, adds all new target
919-
files to the most recent, relevant *bin-n* metadata and informs the
920-
snapshot process to produce a consistent snapshot. Each project release SHOULD
921-
be handled in an atomic transaction, so that a given consistent snapshot
922-
contains all target files of a project release. However, transaction processes
923-
MAY be parallelized under the following constraints:
924-
925-
- Pairs of transaction processes MUST NOT concurrently work on the same project.
926-
- Pairs of transaction processes MUST NOT concurrently work on projects that
927-
belong to the same *bin-n* role.
928-
929-
When a transaction process is finished updating the relevant *bin-n* metadata
930-
it informs the snapshot process to generate a new consistent snapshot. The
931-
snapshot process does so by taking the updated *bin-n* metadata, incrementing
932-
their respective version numbers, signing them with the *bin-n* role key(s),
933-
and writing them to *VERSION_NUMBER.bin-N.json*.
934-
935-
Similarly, the snapshot process then takes the most recent *snapshot* metadata,
936-
updates its *bin-n* metadata version numbers, increments its own version
937-
number, signs it with the *snapshot* role key, and writes it to
938-
*VERSION_NUMBER.snapshot.json*.
910+
When a new distribution file is uploaded to PyPI, PyPI MUST update the
911+
responsible *bin-n* metadata. Remember that all target files are sorted into
912+
bins by their filename hashes. PyPI MUST also update *snapshot* to account for
913+
the updated *bin-n* metadata, and *timestamp* to account for the updated
914+
*snapshot* metadata. These updates SHOULD be handled by an automated *snapshot
915+
process*.
916+
917+
File uploads MAY be handled in parallel, however, consistent snapshots MUST be
918+
produced in a strictly sequential manner. Furthermore, as long as distribution
919+
files are self-contained, a consistent snapshot MAY be produced for each
920+
uploaded file. To do so upload processes place new distribution files into a
921+
concurrency-safe FIFO queue and the snapshot process reads from that queue one
922+
file at a time and performs the following tasks:
923+
924+
First, it adds the new file path to the relevant *bin-n* metadata, increments
925+
its version number, signs it with the *bin-n* role key, and writes it to
926+
*VERSION_NUMBER.bin-N.json*.
927+
928+
Then, it takes the most recent *snapshot* metadata, updates its *bin-n*
929+
metadata version numbers, increments its own version number, signs it with the
930+
*snapshot* role key, and writes it to *VERSION_NUMBER.snapshot.json*.
939931

940932
And finally, the snapshot process takes the most recent *timestamp* metadata,
941933
updates its *snapshot* metadata hash and version number, increments its own
942934
version number, sets a new expiration time, signs it with the *timestamp* role
943935
key, and writes it to *timestamp.json*.
944936

945-
The snapshot process MUST generate consistent snapshots sequentially, reading
946-
the notifications received from the transaction process(es) from a
947-
concurrency-safe FIFO queue. Fortunately, the operation of signing is fast
948-
enough that this may be done a thousand or more times per second.
937+
When updating *bin-n* metadata for a consistent snapshot, the snapshot process
938+
SHOULD also include any new or updated hashes of simple index pages in the
939+
relevant *bin-n* metadata. Note that, simple index pages may be generated
940+
dynamically on API calls, so it is important that their output remains stable
941+
throughout the validity of a consistent snapshot.
949942

950-
If there are multiple files in a release, a project MAY release these files in
951-
separate transactions. For example, a project MAY release files for Windows in
952-
one transaction, and the files for Linux in another transaction. However, a project
953-
SHOULD release files that must belong together in order for everything to work
954-
in the same transaction.
943+
Since the snapshot process MUST generate consistent snapshots in a strictly
944+
sequential manner it constitutes a bottleneck. Fortunately, the operation of
945+
signing is fast enough that this may be done a thousand or more times per
946+
second.
955947

956-
At any rate, PyPI SHOULD use a `transaction log`__ to record project
957-
transaction processes and the snapshot queue for auditing and to recover from
958-
errors after a server failure.
948+
Moreover, PyPI MAY serve distribution files to clients before the corresponding
949+
consistent snapshot metadata is generated. In that case the client software
950+
SHOULD inform the user that full TUF protection is not yet available but will
951+
be shortly.
952+
953+
PyPI SHOULD use a `transaction log`__ to record upload processes and the
954+
snapshot queue for auditing and to recover from errors after a server failure.
959955

960956
__ https://en.wikipedia.org/wiki/Transaction_log
961957

0 commit comments

Comments
 (0)