Skip to content

Commit 5946028

Browse files
committed
PEP 770: Add the additional-files table and reserve dist-info dirs
1 parent 93e9cc3 commit 5946028

File tree

1 file changed

+158
-135
lines changed

1 file changed

+158
-135
lines changed

peps/pep-0770.rst

Lines changed: 158 additions & 135 deletions
Original file line numberDiff line numberDiff line change
@@ -179,98 +179,58 @@ Specification
179179

180180
The changes necessary to implement this PEP include:
181181

182-
* Additions to `Core Metadata <770-spec-core-metadata_>`_, as defined in the
183-
`Core Metadata specification <coremetadataspec_>`__.
184-
* Additions to the author-provided
185-
`project source metadata <770-spec-project-source-metadata_>`_, as defined in the
186-
`pyproject.toml specification <pyprojecttoml_>`__.
187-
* `Additions <770-spec-project-formats_>`_ to the source distribution (sdist),
188-
built distribution (wheel), and installed project specifications.
182+
* A new reserved registry of subdirectory names in the ``.dist-info`` directory.
183+
* A new reserved optional ``[additional-files]`` table with an optional
184+
``sboms`` key added to
185+
`project source metadata <770-spec-project-source-metadata_>`_,
186+
as defined in the `pyproject.toml specification <pyprojecttoml_>`__.
187+
* `Additions <770-spec-project-formats_>`_ to the built distribution (wheel),
188+
and installed project specifications
189189

190190
In addition to the above, an informational PEP will be created for tools
191191
consuming included SBOM documents and other Python package metadata to
192192
generate complete SBOM documents for Python packages.
193193

194-
Terminology
195-
-----------
194+
.. _770-spec-dist-info-subdirs:
195+
196+
Reserved ``.dist-info`` subdirectories registry
197+
-----------------------------------------------
198+
199+
This PEP introduces a new registry of reserved subdirectory names allowed in
200+
the ``.dist-info`` directory for the :term:`distribution archive`
201+
and :term:`installed project` s project types. Future additions to this registry
202+
will be made through the PEP process. The initial values in this registry are:
203+
204+
.. table::
196205

197-
This section describes terminology used later in the document:
198-
199-
.. glossary::
200-
201-
root SBOM directory
202-
The directory under which SBOM files are stored in a
203-
:term:`project source tree`, :term:`distribution archive`
204-
or :term:`installed project`.
205-
Also, the root directory that their paths
206-
recorded in the :ref:`Sbom-File <770-spec-sbom-file-field>`
207-
:term:`Core Metadata field` are relative to.
208-
Defined to be the :term:`project root directory`
209-
for a :term:`project source tree` or
210-
:term:`source distribution <Source Distribution (or "sdist")>`;
211-
and a subdirectory named ``sboms`` of
212-
the directory containing the :term:`built metadata`—
213-
i.e., the ``.dist-info/sboms`` directory—
214-
for a :term:`Built Distribution` or :term:`installed project`.
215-
216-
.. _770-spec-core-metadata:
217-
218-
Core Metadata
219-
-------------
220-
221-
.. _770-spec-sbom-file-field:
222-
223-
Add ``Sbom-File`` field
224-
~~~~~~~~~~~~~~~~~~~~~~~
225-
226-
The ``Sbom-File`` is a new optional Core Metadata field. Each instance contains a
227-
string representation of the path to an SBOM document. The path is specified
228-
relative to the :term:`root SBOM directory` for all project types. It is a
229-
multi-use field that MAY appear zero or more times and each instance lists the
230-
path to one such file. Files specified under this field are SBOM documents
231-
that are distributed with the package.
232-
233-
As `specified by this PEP <#770-spec-project-formats>`__, its value is also
234-
that file's path relative to the :term:`root SBOM directory` in both installed
235-
projects and the standardized Distribution Package types.
236-
237-
If an ``Sbom-File`` is listed in a
238-
:term:`Source Distribution <Source Distribution (or "sdist")>` or
239-
:term:`Built Distribution`'s Core Metadata:
240-
241-
* That file MUST be included in the :term:`distribution archive` at the
242-
specified path relative to the :term:`root SBOM directory`.
243-
* Installers MUST install the file with the :term:`project` at that same
244-
relative path.
245-
* Inside the :term:`root SBOM directory`, packaging tools MUST reproduce the
246-
directory structure under which the source files are located relative to the
247-
project root.
248-
* Path delimiters MUST be the forward slash character (``/``), and parent
249-
directory indicators (``..``) MUST NOT be used.
250-
251-
For all newly-uploaded distribution archives that include one or more
252-
``Sbom-File`` fields in their Core Metadata and declare a ``Metadata-Version``
253-
of ``2.5`` or higher, PyPI and other indices SHOULD validate that all files
254-
specified with ``Sbom-File`` are present in the distribution archives.
206+
================= ==========
207+
Directory name PEP
208+
================= ==========
209+
``licenses`` :pep:`639`
210+
``license_files`` N/A (See :ref:`770-backwards-compat`)
211+
``sboms`` :pep:`770`
212+
================= ==========
213+
214+
Build backends MUST NOT create subdirectories in the ``.dist-info`` directory
215+
beyond the names in the registry to avoid collisions with future reserved names.
255216

256217
.. _770-spec-project-source-metadata:
257218

258219
Project source metadata
259220
-----------------------
260221

261-
This PEP specifies changes to the project's source metadata under a
262-
``[project]`` table in the ``pyproject.toml`` file.
222+
This PEP specifies changes to the project's source metadata
223+
in the ``pyproject.toml`` file:
263224

264-
Add ``sbom-files`` key
265-
~~~~~~~~~~~~~~~~~~~~~~
225+
Add new ``[additional-files]`` table
226+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
266227

267-
A new optional ``sbom-files`` key is added to the ``[project]`` table for
268-
specifying paths in the project source tree relative to ``pyproject.toml`` to
269-
file(s) containing SBOMs to be distributed with the package. This key
270-
corresponds to the ``Sbom-File`` fields in the Core Metadata.
228+
A new optional ``[additional-files]`` table is added for specifying paths
229+
in the project source tree relative to ``pyproject.toml`` to file(s) which
230+
should be included in the built project to a defined directory.
271231

272-
Its value is an array of strings which MUST contain valid glob patterns, as
273-
specified below:
232+
This new table has only one defined optional key: ``sboms``. The value of the
233+
``sboms`` key MUST be an array of valid glob patterns, as specified below:
274234

275235
* Alphanumeric characters, underscores (``_``), hyphens (``-``) and dots (``.``)
276236
MUST be matched verbatim.
@@ -292,50 +252,48 @@ they can also be defined.
292252

293253
Build tools:
294254

295-
* MUST treat each value as a glob pattern, and MUST raise an error if the
296-
pattern contains invalid glob syntax.
255+
* MUST treat each value in the array as a glob pattern, and MUST raise an error
256+
if the pattern contains invalid glob syntax.
297257
* MUST include all files matched by a listed pattern in all distribution
298-
archives.
299-
* MUST list each matched file path under an ``Sbom-File`` field in the
300-
Core Metadata.
258+
archives under the ``.dist-info/sboms`` directory.
301259
* MUST raise an error if any individual user-specified pattern does not match
302260
at least one file.
303261

304-
If the ``sbom-files`` key is present and is set to a value of an empty array,
262+
If the ``sboms`` key is present and is set to a value of an empty array,
305263
then tools MUST NOT include any SBOM files and MUST NOT raise an error.
306264

307265
Examples of valid SBOM files declarations:
308266

309267
.. code-block:: toml
310268
311-
[project]
312-
sbom-files = ["bom.json"]
269+
[additional-files]
270+
sboms = ["bom.json"]
313271
314-
[project]
315-
sbom-files = ["sboms/openssl.cdx.json", "sboms/openssl.spdx.json"]
272+
[additional-files]
273+
sboms = ["sboms/openssl.cdx.json", "sboms/openssl.spdx.json"]
316274
317-
[project]
318-
sbom-files = ["sboms/*"]
275+
[additional-files]
276+
sboms = ["sboms/*"]
319277
320-
[project]
321-
sbom-files = []
278+
[additional-files]
279+
sboms = []
322280
323281
Examples of invalid SBOM files declarations:
324282

325283
.. code-block:: toml
326284
327-
[project]
328-
sbom-files = ["..\bom.json"]
285+
[additional-files]
286+
sboms = ["..\bom.json"]
329287
330288
Reason: ``..`` must not be used. ``\\`` is an invalid path delimiter, ``/``
331289
must be used.
332290

333291
.. code-block:: toml
334292
335-
[project]
336-
sbom-files = ["bom{.json*"]
293+
[additional-files]
294+
sboms = ["bom{.json*"]
337295
338-
Reason: ``bom{.json`` is not a valid glob.
296+
Reason: ``bom{.json*`` is not a valid glob.
339297

340298
.. _770-spec-project-formats:
341299

@@ -345,36 +303,22 @@ SBOM files in project formats
345303
A few additions will be made to the existing specifications.
346304

347305
:term:`Project source trees <Project source tree>`
348-
Per :ref:`639-spec-source-metadata` section, the
306+
Per :ref:`770-spec-project-source-metadata` section, the
349307
`Declaring Project Metadata specification <pyprojecttoml_>`__
350-
will be updated to reflect that SBOM file paths MUST be relative to the
351-
project root directory; i.e. the directory containing the ``pyproject.toml``
352-
(or equivalently, other legacy project configuration,
353-
e.g. ``setup.py``, ``setup.cfg``, etc).
354-
355-
:term:`Source distributions (sdists) <Source Distribution (or "sdist")>`
356-
The sdist specification will be updated to reflect that if the
357-
``Metadata-Version`` is ``2.5`` or greater, the sdist MUST contain any SBOM
358-
files specified by the ``Sbom-File`` field in the ``PKG-INFO`` at their
359-
respective paths relative to the sdist (containing the ``pyproject.toml`` and
360-
the ``PKG-INFO`` Core Metadata).
308+
will be updated to add the ``[additional-files]`` table
309+
and optional ``sboms`` key.
361310

362311
:term:`Built distributions <Built distribution>` (:term:`wheels <wheel>`)
363-
The wheel specification will be updated to reflect that if the
364-
``Metadata-Version`` is ``2.5`` or greater and one or more ``Sbom-File``
365-
fields are specified, the ``.dist-info`` directory MUST contain an ``sboms``
366-
subdirectory, which MUST contain the files listed in the ``Sbom-File`` fields
367-
in the ``METADATA`` file at their respective paths relative to the ``sboms``
368-
directory.
312+
313+
The wheel specification will be updated to add the new registry of reserved
314+
directory names and to reflect that if the ``.dist-info/sboms`` subdirectory
315+
is specified that the directory contains SBOM files.
369316

370317
:term:`Installed projects <Installed project>`
371-
The Recording Installed Projects specification will be updated to reflect that
372-
if the ``Metadata-Version`` is ``2.5`` or greater and one or more
373-
``Sbom-File`` fields is specified, the ``.dist-info`` directory MUST contain
374-
an ``sboms`` subdirectory which MUST contain the files listed in the
375-
``Sbom-File`` fields in the ``METADATA`` file at their respective paths
376-
relative to the ``sboms`` directory, and that any files in this directory MUST
377-
be copied from wheels by install tools.
318+
The Recording Installed Projects specification will be updated to reflect
319+
that if the ``.dist-info/sboms`` subdirectory is specified that the directory
320+
contains SBOM files and that any files in this directory MUST be copied from
321+
wheels by install tools.
378322

379323
SBOM data interoperability
380324
--------------------------
@@ -406,18 +350,69 @@ PyPI and other indices MAY validate the contents of SBOM documents specified by
406350
this PEP, but MUST NOT validate or reject data for unknown
407351
SBOM standards, versions, or fields.
408352

353+
.. _770-backwards-compat:
354+
409355
Backwards Compatibility
410356
=======================
411357

412-
There are no backwards compatibility concerns for this PEP.
413-
414-
The changes to Python package Core Metadata and ``pyproject.toml`` are
415-
only additive, this PEP doesn't change the behavior of any existing fields.
416-
417-
Tools which are processing Python packages can use the ``Sbom-File`` core
418-
metadata field to clearly delineate between packages which include SBOM
419-
documents that implement this PEP (and thus have more requirements) and
420-
packages which include SBOM documents before this PEP was authored.
358+
Reserved ``.dist-info`` subdirectories registry
359+
-----------------------------------------------
360+
361+
The new registry of reserved ``.dist-info`` subdirectories represents
362+
a new reservation that wasn't previously documented, thus has the potential to
363+
break assumptions being made by already existing tools.
364+
365+
To check what ``.dist-info`` subdirectory names are in use today
366+
a query across
367+
`all files in package archives on PyPI <https://sethmlarson.dev/security-developer-in-residence-weekly-report-18>`__
368+
was executed:
369+
370+
.. code-block:: sql
371+
372+
SELECT (
373+
regexp_extract(archive_path, '.*\.dist-info/([^/]+)/', 1) AS dirname,
374+
COUNT(DISTINCT project_name) AS projects
375+
)
376+
FROM '*.parquet'
377+
WHERE archive_path LIKE '%.dist-info/%/%'
378+
GROUP BY dirname ORDER BY projects DESC;
379+
380+
Note that this only includes records for
381+
*files* and thus won't return results for empty directories. Empty directories
382+
being pervasively used and somehow load-bearing is unlikely, so is an accepted
383+
risk of using this method. This query yielded the following results:
384+
385+
.. table::
386+
387+
====================== ===============
388+
Subdirectory Unique Projects
389+
====================== ===============
390+
``licenses`` 22,026
391+
``license_files`` 1,828
392+
``LICENSES`` 170
393+
``.ipynb_checkpoints`` 85
394+
``license`` 18
395+
``.wex`` 9
396+
``dist`` 8
397+
``include`` 6
398+
``build`` 5
399+
``tmp`` 4
400+
``src`` 3
401+
``calmjs_artifacts`` 3
402+
``.idea`` 2
403+
====================== ===============
404+
405+
Not shown above are around ~50 other subdirectory names that are used in a
406+
single project. From these results we can see:
407+
408+
* Most subdirectories under ``.dist-info`` are to do with licensing,
409+
one of which (``licenses``) is specified by :pep:`639` and other
410+
(``license_files``) which is being used by the Hatch build backend.
411+
* The ``sboms`` subdirectory doesn't collide with existing use.
412+
* Other subdirectory names under ``.dist-info`` appear to be either not
413+
widespread or accidental.
414+
415+
As a result of this query
421416

422417
Security Implications
423418
=====================
@@ -463,12 +458,11 @@ For packages which cannot be automatically annotated and if the package author
463458
wishes to provide an SBOM the approach will be to generate or author SBOM files
464459
and then include those files using ``pyproject.toml``:
465460

466-
.. code-block:: toml
461+
.. code-block:: toml
467462
468-
[project]
469-
...
470-
sbom-files = [
471-
"sboms/bom.cdx.json"
463+
[additional-files]
464+
sboms = [
465+
"sboms/bom.cdx.json"
472466
]
473467
474468
For projects manually specifying an SBOM document the challenge will be
@@ -558,6 +552,31 @@ a single SBOM standard. Tools that use SBOM data today already need to support
558552
multiple formats to handle this situation, so a future standard that updates to
559553
require only one standard would have no effect on downstream SBOM tools.
560554

555+
Using metadata fields to specify SBOM files in archives
556+
-------------------------------------------------------
557+
558+
A previous iteration of this specification used an ``Sbom-File`` metadata
559+
field to specify an SBOM file within a source or binary distribution archive.
560+
This would make the implementation similar to :pep:`639` which uses the
561+
``License-File`` field to enumerate license files in archives.
562+
563+
The primary issue with this approach is that SBOM files can originate from both
564+
static and dynamic sources: like versioned source code, the build backend,
565+
or from tools adding SBOM files after the build has completed (like auditwheel).
566+
567+
Metadata fields must either be static or dynamic, not both. This is
568+
in direct conflict with the best-case scenario for SBOM data: that SBOM files
569+
are added automatically by tools during the build of a Python package without
570+
user-involvement or knowledge. Compare this situation to license files which
571+
are almost always static.
572+
573+
The 639-style approach was ultimately dropped in favor of defining SBOMs simply
574+
by their presence in the ``.dist-info/sboms`` directory and using a new table in
575+
``pyproject.toml`` called ``[additional-files]`` to define SBOMs in source
576+
distributions. This approach allows users to specify static SBOM files while
577+
still empowering build backends and tools to add their own SBOM data without the
578+
static/dynamic conflict.
579+
561580
Open Issues
562581
===========
563582

@@ -579,6 +598,10 @@ References
579598
wheel and then use an SBOM generation tool Syft to detect the SBOM in the
580599
installed package.
581600

601+
* `Querying every file in every release on PyPI <https://sethmlarson.dev/security-developer-in-residence-weekly-report-18>`_.
602+
The dataset available on `py-code.org <py-code.org>`__ from Tom Forbes was
603+
used to check subdirectory usage in ``.dist-info`` files.
604+
582605
.. _phantom dependency: https://www.endorlabs.com/learn/dependency-resolution-in-python-beware-the-phantom-dependency
583606
.. _coremetadataspec: https://packaging.python.org/specifications/core-metadata
584607
.. _pyprojecttoml: https://packaging.python.org/en/latest/specifications/pyproject-toml/

0 commit comments

Comments
 (0)