Skip to content

Update Warehouse documentation to inform about BigQuery datasets #8240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 6, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions docs/api-reference/bigquery-datasets.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
BigQuery Datasets
=================

We use BigQuery to serve our public datasets. PyPI offers two tables whose
data is sourced from projects on PyPI. The tables and its pertaining data are licensed
under the `Creative Commons License <https://creativecommons.org/licenses/by/4.0/>`_.

Download Statistics Table
-------------------------

The download statistics table allows you learn more about downloads patterns of
packages hosted on PyPI. This table is populated through the `Linehaul
project <https://github.com/pypa/linehaul>`_ by streaming download logs from PyPI
to BigQuery. For more information on analyzing PyPI package downloads, see the `Python
Package Guide <https://packaging.python.org/guides/analyzing-pypi-package-downloads/>`_

Project Metadata Table
----------------------

We also have a table that provides access to distribution metadata
as outlined by the `core metadata specifications <https://packaging.python.org/specifications/core-metadata/>`_.
The table is meant to be a data dump of metadata from every
release on PyPI, which means that the rows in this BigQuery table
are immutable and are not removed even if a release or project is deleted.
This data can be accessible under the ``the-psf.pypi.distribution_metadata``
public dataset on BigQuery.
5 changes: 3 additions & 2 deletions docs/api-reference/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,8 @@ use our RSS feeds.
No new integrations should use the XML-RPC APIs as they are planned for
deprecation. Existing consumers should migrate to JSON/RSS/Legacy APIs.

Available APIs
--------------
Available APIs & Datasets
-------------------------

.. toctree::
:maxdepth: 2
Expand All @@ -68,3 +68,4 @@ Available APIs
stats
xml-rpc
integration-guide
bigquery-datasets