Skip to content

Commit 7629eaa

Browse files
patelneel55di
andcommitted
Update Warehouse documentation to inform about BigQuery datasets (pypi#8240)
* Update documentation to inform about BigQuery datasets * Fix linter errors * Rename rst file Co-authored-by: Dustin Ingram <[email protected]>
1 parent 3524dbe commit 7629eaa

File tree

2 files changed

+29
-2
lines changed

2 files changed

+29
-2
lines changed
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
BigQuery Datasets
2+
=================
3+
4+
We use BigQuery to serve our public datasets. PyPI offers two tables whose
5+
data is sourced from projects on PyPI. The tables and its pertaining data are licensed
6+
under the `Creative Commons License <https://creativecommons.org/licenses/by/4.0/>`_.
7+
8+
Download Statistics Table
9+
-------------------------
10+
11+
The download statistics table allows you learn more about downloads patterns of
12+
packages hosted on PyPI. This table is populated through the `Linehaul
13+
project <https://github.com/pypa/linehaul>`_ by streaming download logs from PyPI
14+
to BigQuery. For more information on analyzing PyPI package downloads, see the `Python
15+
Package Guide <https://packaging.python.org/guides/analyzing-pypi-package-downloads/>`_
16+
17+
Project Metadata Table
18+
----------------------
19+
20+
We also have a table that provides access to distribution metadata
21+
as outlined by the `core metadata specifications <https://packaging.python.org/specifications/core-metadata/>`_.
22+
The table is meant to be a data dump of metadata from every
23+
release on PyPI, which means that the rows in this BigQuery table
24+
are immutable and are not removed even if a release or project is deleted.
25+
This data can be accessible under the ``the-psf.pypi.distribution_metadata``
26+
public dataset on BigQuery.

docs/api-reference/index.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,8 +56,8 @@ use our RSS feeds.
5656
No new integrations should use the XML-RPC APIs as they are planned for
5757
deprecation. Existing consumers should migrate to JSON/RSS/Legacy APIs.
5858

59-
Available APIs
60-
--------------
59+
Available APIs & Datasets
60+
-------------------------
6161

6262
.. toctree::
6363
:maxdepth: 2
@@ -68,3 +68,4 @@ Available APIs
6868
stats
6969
xml-rpc
7070
integration-guide
71+
bigquery-datasets

0 commit comments

Comments
 (0)