From 6f8d2a5bfc1519c44c668f6737bb42044c9d47b9 Mon Sep 17 00:00:00 2001 From: Tim Swast Date: Mon, 25 Nov 2019 14:25:41 -0800 Subject: [PATCH 1/3] docs: add markdown version of changelog TODO: make changelog page a symlink and render markdown in sphinx --- CHANGELOG.md | 480 ++++++++++++++++++++++++++++++++++++++ convert_changelog.py | 9 + docs/source/changelog.rst | 9 + 3 files changed, 498 insertions(+) create mode 100644 CHANGELOG.md create mode 100644 convert_changelog.py diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 00000000..dd258c57 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,480 @@ +# Changelog + +## 0.15.0 / 2021-03-30 + +### Features + +- Load DataFrame with `to_gbq` to a table in a project different from + the API client project. Specify the target table ID as + `project.dataset.table` to use this feature. + ([#321](https://github.com/googleapis/python-bigquery-pandas/issues/321), + [#347](https://github.com/googleapis/python-bigquery-pandas/issues/347)) +- Allow billing project to be separate from destination table project + in `to_gbq`. + ([#321](https://github.com/googleapis/python-bigquery-pandas/issues/321)) + +### Bug fixes + +- Avoid 403 error from `to_gbq` when table has `policyTags`. + ([#354](https://github.com/googleapis/python-bigquery-pandas/issues/354)) +- Avoid `client.dataset` deprecation warnings. + ([#312](https://github.com/googleapis/python-bigquery-pandas/issues/312)) + +### Dependencies + +- Drop support for Python 3.5 and 3.6. + ([#337](https://github.com/googleapis/python-bigquery-pandas/issues/337)) +- Drop support for google-cloud-bigquery==2.4.\* due to query + hanging bug. + ([#343](https://github.com/googleapis/python-bigquery-pandas/issues/343)) + +## 0.14.1 / 2020-11-10 + +### Bug fixes + +- Use `object` dtype for `TIME` columns. + ([#328](https://github.com/googleapis/python-bigquery-pandas/issues/328)) +- Encode floating point values with greater precision. + ([#326](https://github.com/googleapis/python-bigquery-pandas/issues/326)) +- Support `INT64` and other standard SQL aliases in + `~pandas_gbq.to_gbq` `table_schema` argument. + ([#322](https://github.com/googleapis/python-bigquery-pandas/issues/322)) + +## 0.14.0 / 2020-10-05 + +- Add `dtypes` argument to `read_gbq`. Use this argument to override + the default `dtype` for a particular column in the query results. + For example, this can be used to select nullable integer columns as + the `Int64` nullable integer pandas extension type. + ([#242](https://github.com/googleapis/python-bigquery-pandas/issues/242), + [#332](https://github.com/googleapis/python-bigquery-pandas/issues/332)) + +``` python +df = gbq.read_gbq( + "SELECT CAST(NULL AS INT64) AS null_integer", + dtypes={"null_integer": "Int64"}, +) +``` + +### Dependency updates + +- Support `google-cloud-bigquery-storage` 2.0 and higher. + ([#329](https://github.com/googleapis/python-bigquery-pandas/issues/329)) +- Update the minimum version of `pandas` to 0.20.1. + ([#331](https://github.com/googleapis/python-bigquery-pandas/issues/331)) + +### Internal changes + +- Update tests to run against Python 3.8. + ([#331](https://github.com/googleapis/python-bigquery-pandas/issues/331)) + +## 0.13.3 / 2020-09-30 + +- Include needed "extras" from `google-cloud-bigquery` package as + dependencies. Exclude incompatible 2.0 version. + ([#324](https://github.com/googleapis/python-bigquery-pandas/issues/324), + [#329](https://github.com/googleapis/python-bigquery-pandas/issues/329)) + +## 0.13.2 / 2020-05-14 + +- Fix `Provided Schema does not match Table` error when the existing + table contains required fields. + ([#315](https://github.com/googleapis/python-bigquery-pandas/issues/315)) + +## 0.13.1 / 2020-02-13 + +- Fix `AttributeError` with BQ Storage API to download empty results. + ([#299](https://github.com/googleapis/python-bigquery-pandas/issues/299)) + +## 0.13.0 / 2019-12-12 + +- Raise `NotImplementedError` when the deprecated `private_key` + argument is used. + ([#301](https://github.com/googleapis/python-bigquery-pandas/issues/301)) + +## 0.12.0 / 2019-11-25 + +### New features + +- Add `max_results` argument to `~pandas_gbq.read_gbq()`. Use this + argument to limit the number of rows in the results DataFrame. Set + `max_results` to 0 to ignore query outputs, such as for DML or DDL + queries. + ([#102](https://github.com/googleapis/python-bigquery-pandas/issues/102)) +- Add `progress_bar_type` argument to `~pandas_gbq.read_gbq()`. Use + this argument to display a progress bar when downloading data. + ([#182](https://github.com/googleapis/python-bigquery-pandas/issues/182)) + +### Bug fixes + +- Fix resource leak with `use_bqstorage_api` by closing BigQuery + Storage API client after use. + ([#294](https://github.com/googleapis/python-bigquery-pandas/issues/294)) + +### Dependency updates + +- Update the minimum version of `google-cloud-bigquery` to 1.11.1. + ([#296](https://github.com/googleapis/python-bigquery-pandas/issues/296)) + +### Documentation + +- Add code samples to introduction and refactor howto guides. + ([#239](https://github.com/googleapis/python-bigquery-pandas/issues/239)) + +## 0.11.0 / 2019-07-29 + +- **Breaking Change:** Python 2 support has been dropped. This is to + align with the pandas package which dropped Python 2 support at the + end of 2019. + ([#268](https://github.com/googleapis/python-bigquery-pandas/issues/268)) + +### Enhancements + +- Ensure `table_schema` argument is not modified inplace. + ([#278](https://github.com/googleapis/python-bigquery-pandas/issues/278)) + +### Implementation changes + +- Use object dtype for `STRING`, `ARRAY`, and `STRUCT` columns when + there are zero rows. + ([#285](https://github.com/googleapis/python-bigquery-pandas/issues/285)) + +### Internal changes + +- Populate `user-agent` with `pandas` version information. + ([#281](https://github.com/googleapis/python-bigquery-pandas/issues/281)) +- Fix `pytest.raises` usage for latest pytest. Fix warnings in tests. + ([#282](https://github.com/googleapis/python-bigquery-pandas/issues/282)) +- Update CI to install nightly packages in the conda tests. + ([#254](https://github.com/googleapis/python-bigquery-pandas/issues/254)) + +## 0.10.0 / 2019-04-05 + +- **Breaking Change:** Default SQL dialect is now `standard`. Use + `pandas_gbq.context.dialect` to override the default value. + ([#195](https://github.com/googleapis/python-bigquery-pandas/issues/195), + [#245](https://github.com/googleapis/python-bigquery-pandas/issues/245)) + +### Documentation + +- Document `BigQuery data type to pandas dtype conversion + ` for `read_gbq`. + ([#269](https://github.com/googleapis/python-bigquery-pandas/issues/269)) + +### Dependency updates + +- Update the minimum version of `google-cloud-bigquery` to 1.9.0. + ([#247](https://github.com/googleapis/python-bigquery-pandas/issues/247)) +- Update the minimum version of `pandas` to 0.19.0. + ([#262](https://github.com/googleapis/python-bigquery-pandas/issues/262)) + +### Internal changes + +- Update the authentication credentials. **Note:** You may need to set + `reauth=True` in order to update your credentials to the most recent + version. This is required to use new functionality such as the + BigQuery Storage API. + ([#267](https://github.com/googleapis/python-bigquery-pandas/issues/267)) +- Use `to_dataframe()` from `google-cloud-bigquery` in the + `read_gbq()` function. + ([#247](https://github.com/googleapis/python-bigquery-pandas/issues/247)) + +### Enhancements + +- Fix a bug where pandas-gbq could not upload an empty DataFrame. + ([#237](https://github.com/googleapis/python-bigquery-pandas/issues/237)) +- Allow `table_schema` in `to_gbq` to contain only a subset of + columns, with the rest being populated using the DataFrame dtypes + ([#218](https://github.com/googleapis/python-bigquery-pandas/issues/218)) + (contributed by @johnpaton) +- Read `project_id` in `to_gbq` from provided `credentials` if + available (contributed by @daureg) +- `read_gbq` uses the timezone-aware + `DatetimeTZDtype(unit='ns', tz='UTC')` dtype for BigQuery + `TIMESTAMP` columns. + ([#269](https://github.com/googleapis/python-bigquery-pandas/issues/269)) +- Add `use_bqstorage_api` to `read_gbq`. The BigQuery Storage API can + be used to download large query results (>125 MB) more quickly. If + the BQ Storage API can't be used, the BigQuery API is used instead. + ([#133](https://github.com/googleapis/python-bigquery-pandas/issues/133), + [#270](https://github.com/googleapis/python-bigquery-pandas/issues/270)) + +## 0.9.0 / 2019-01-11 + +- Warn when deprecated `private_key` parameter is used + ([#240](https://github.com/googleapis/python-bigquery-pandas/issues/240)) +- **New dependency** Use the `pydata-google-auth` package for + authentication. + ([#241](https://github.com/googleapis/python-bigquery-pandas/issues/241)) + +## 0.8.0 / 2018-11-12 + +### Breaking changes + +- **Deprecate** `private_key` parameter to `pandas_gbq.read_gbq` and + `pandas_gbq.to_gbq` in favor of new `credentials` argument. Instead, + create a credentials object using + `google.oauth2.service_account.Credentials.from_service_account_info` + or + `google.oauth2.service_account.Credentials.from_service_account_file`. + See the `authentication how-to guide ` for + examples. + ([#161](https://github.com/googleapis/python-bigquery-pandas/issues/161), + [#231](https://github.com/googleapis/python-bigquery-pandas/issues/231)) + +### Enhancements + +- Allow newlines in data passed to `to_gbq`. + ([#180](https://github.com/googleapis/python-bigquery-pandas/issues/180)) +- Add `pandas_gbq.context.dialect` to allow overriding the default SQL + syntax dialect. + ([#195](https://github.com/googleapis/python-bigquery-pandas/issues/195), + [#235](https://github.com/googleapis/python-bigquery-pandas/issues/235)) +- Support Python 3.7. + ([#197](https://github.com/googleapis/python-bigquery-pandas/issues/197), + [#232](https://github.com/googleapis/python-bigquery-pandas/issues/232)) + +### Internal changes + +- Migrate tests to CircleCI. + ([#228](https://github.com/googleapis/python-bigquery-pandas/issues/228), + [#232](https://github.com/googleapis/python-bigquery-pandas/issues/232)) + +## 0.7.0 / 2018-10-19 + +- int columns which contain NULL are now cast to float, rather than object type. + ([#174](https://github.com/googleapis/python-bigquery-pandas/issues/174)) +- DATE, DATETIME and TIMESTAMP columns are now parsed as pandas' + timestamp objects + ([#224](https://github.com/googleapis/python-bigquery-pandas/issues/224)) +- Add `pandas_gbq.Context` to cache credentials in-memory, across + calls to `read_gbq` and `to_gbq`. + ([#198](https://github.com/googleapis/python-bigquery-pandas/issues/198), + [#208](https://github.com/googleapis/python-bigquery-pandas/issues/208)) +- Fast queries now do not log above `DEBUG` level. + ([#204](https://github.com/googleapis/python-bigquery-pandas/issues/204)) + With BigQuery's release of + [clustering](https://cloud.google.com/bigquery/docs/clustered-tables) + querying smaller samples of data is now faster and cheaper. +- Don't load credentials from disk if reauth is `True`. + ([#212](https://github.com/googleapis/python-bigquery-pandas/issues/212)) + This fixes a bug where pandas-gbq could not refresh credentials if + the cached credentials were invalid, revoked, or expired, even when + `reauth=True`. +- Catch RefreshError when trying credentials. + ([#226](https://github.com/googleapis/python-bigquery-pandas/issues/226)) + +### Internal changes + +- Avoid listing datasets and tables in system tests. + ([#215](https://github.com/googleapis/python-bigquery-pandas/issues/215)) +- Improved performance from eliminating some duplicative parsing steps + ([#224](https://github.com/googleapis/python-bigquery-pandas/issues/224)) + +## 0.6.1 / 2018-09-11 + +- Improved `read_gbq` performance and memory consumption by delegating + `DataFrame` construction to the Pandas library, radically reducing + the number of loops that execute in python + ([#128](https://github.com/googleapis/python-bigquery-pandas/issues/128)) +- Reduced verbosity of logging from `read_gbq`, particularly for short + queries. + ([#201](https://github.com/googleapis/python-bigquery-pandas/issues/201)) +- Avoid `SELECT 1` query when running `to_gbq`. + ([#202](https://github.com/googleapis/python-bigquery-pandas/issues/202)) + +## 0.6.0 / 2018-08-15 + +- Warn when `dialect` is not passed in to `read_gbq`. The default + dialect will be changing from 'legacy' to 'standard' in a future + version. + ([#195](https://github.com/googleapis/python-bigquery-pandas/issues/195)) +- Use general float with 15 decimal digit precision when writing to + local CSV buffer in `to_gbq`. This prevents numerical overflow in + certain edge cases. + ([#192](https://github.com/googleapis/python-bigquery-pandas/issues/192)) + +## 0.5.0 / 2018-06-15 + +- Project ID parameter is optional in `read_gbq` and `to_gbq` when it + can inferred from the environment. Note: you must still pass in a + project ID when using user-based authentication. + ([#103](https://github.com/googleapis/python-bigquery-pandas/issues/103)) +- Progress bar added for `to_gbq`, through an optional library tqdm as dependency. + ([#162](https://github.com/googleapis/python-bigquery-pandas/issues/162)) +- Add location parameter to `read_gbq` and `to_gbq` so that pandas-gbq + can work with datasets in the Tokyo region. + ([#177](https://github.com/googleapis/python-bigquery-pandas/issues/177)) + +### Documentation + +- Add `authentication how-to guide `. + ([#183](https://github.com/googleapis/python-bigquery-pandas/issues/183)) +- Update `contributing` guide with new paths to tests. + ([#154](https://github.com/googleapis/python-bigquery-pandas/issues/154), + [#164](https://github.com/googleapis/python-bigquery-pandas/issues/164)) + +### Internal changes + +- Tests now use nox to run in multiple + Python environments. + ([#52](https://github.com/googleapis/python-bigquery-pandas/issues/52)) +- Renamed internal modules. + ([#154](https://github.com/googleapis/python-bigquery-pandas/issues/154)) +- Refactored auth to an internal auth module. + ([#176](https://github.com/googleapis/python-bigquery-pandas/issues/176)) +- Add unit tests for `get_credentials()`. + ([#184](https://github.com/googleapis/python-bigquery-pandas/issues/184)) + +## 0.4.1 / 2018-04-05 + +- Only show `verbose` deprecation warning if Pandas version does not + populate it. + ([#157](https://github.com/googleapis/python-bigquery-pandas/issues/157)) + +## 0.4.0 / 2018-04-03 + +- Fix bug in read_gbq when building a + dataframe with integer columns on Windows. Explicitly use 64bit + integers when converting from BQ types. + ([#119](https://github.com/googleapis/python-bigquery-pandas/issues/119)) +- Fix bug in read_gbq when querying for + an array of floats + ([#123](https://github.com/googleapis/python-bigquery-pandas/issues/123)) +- Fix bug in read_gbq with + configuration argument. Updates read_gbq to account for breaking change in + the way `google-cloud-python` version 0.32.0+ handles query + configuration API representation. + ([#152](https://github.com/googleapis/python-bigquery-pandas/issues/152)) +- Fix bug in to_gbq where seconds were + discarded in timestamp columns. + ([#148](https://github.com/googleapis/python-bigquery-pandas/issues/148)) +- Fix bug in to_gbq when supplying a + user-defined schema + ([#150](https://github.com/googleapis/python-bigquery-pandas/issues/150)) +- **Deprecate** the `verbose` parameter in read_gbq and to_gbq. Messages use the logging module + instead of printing progress directly to standard output. + ([#12](https://github.com/googleapis/python-bigquery-pandas/issues/12)) + +## 0.3.1 / 2018-02-13 + +- Fix an issue where Unicode couldn't be uploaded in Python 2 + ([#106](https://github.com/googleapis/python-bigquery-pandas/issues/106)) +- Add support for a passed schema in `` `to_gbq `` instead inferring the schema from the passed + DataFrame with DataFrame.dtypes + (#46 + \<\>\`\_) +- Fix an issue where a dataframe containing both integer and floating + point columns could not be uploaded with `to_gbq` + ([#116](https://github.com/googleapis/python-bigquery-pandas/issues/116)) +- `to_gbq` now uses `to_csv` to avoid manually looping over rows in a + dataframe (should result in faster table uploads) + ([#96](https://github.com/googleapis/python-bigquery-pandas/issues/96)) + +## 0.3.0 / 2018-01-03 + +- Use the + [google-cloud-bigquery](https://googlecloudplatform.github.io/google-cloud-python/latest/bigquery/usage.html) + library for API calls. The `google-cloud-bigquery` package is a new + dependency, and dependencies on `google-api-python-client` and + `httplib2` are removed. See the [installation + guide](https://pandas-gbq.readthedocs.io/en/latest/install.html#dependencies) + for more details. + ([#93](https://github.com/googleapis/python-bigquery-pandas/issues/93)) +- Structs and arrays are now named properly + ([#23](https://github.com/googleapis/python-bigquery-pandas/issues/23)) + and BigQuery functions like `array_agg` no longer run into errors + during type conversion + ([#22](https://github.com/googleapis/python-bigquery-pandas/issues/22)). +- `to_gbq` now uses a load job instead of the streaming API. Remove + `StreamingInsertError` class, as it is no longer used by `to_gbq`. + ([#7](https://github.com/googleapis/python-bigquery-pandas/issues/7), + [#75](https://github.com/googleapis/python-bigquery-pandas/issues/75)) + +## 0.2.1 / 2017-11-27 + +- `read_gbq` now raises `QueryTimeout` if the request exceeds the + `query.timeoutMs` value specified in the BigQuery configuration. + ([#76](https://github.com/googleapis/python-bigquery-pandas/issues/76)) +- Environment variable `PANDAS_GBQ_CREDENTIALS_FILE` can now be used + to override the default location where the BigQuery user account + credentials are stored. + ([#86](https://github.com/googleapis/python-bigquery-pandas/issues/86)) +- BigQuery user account credentials are now stored in an + application-specific hidden user folder on the operating system. + ([#41](https://github.com/googleapis/python-bigquery-pandas/issues/41)) + +## 0.2.0 / 2017-07-24 + +- Drop support for Python 3.4 + ([#40](https://github.com/googleapis/python-bigquery-pandas/issues/40)) +- The dataframe passed to + `` `.to_gbq(...., if_exists='append') `` + needs to contain only a subset of the fields in the BigQuery schema. + (#24 + \<\>\`\_) +- Use the [google-auth](https://google-auth.readthedocs.io/en/latest/) + library for authentication because `oauth2client` is deprecated. + ([#39](https://github.com/googleapis/python-bigquery-pandas/issues/39)) +- `read_gbq` now has a `auth_local_webserver` boolean argument for + controlling whether to use web server or console flow when getting + user credentials. Replaces --noauth_local_webserver command line + argument. + ([#35](https://github.com/googleapis/python-bigquery-pandas/issues/35)) +- `read_gbq` now displays the BigQuery Job ID and standard price in + verbose output. + ([#70](https://github.com/googleapis/python-bigquery-pandas/issues/70) + and + [#71](https://github.com/googleapis/python-bigquery-pandas/issues/71)) + +## 0.1.6 / 2017-05-03 + +- All gbq errors will simply be subclasses of `ValueError` and no + longer inherit from the deprecated `PandasError`. + +## 0.1.4 / 2017-03-17 + +- `InvalidIndexColumn` will be raised instead of `InvalidColumnOrder` + in `read_gbq` when the index column specified does not exist in the + BigQuery schema. + ([#6](https://github.com/googleapis/python-bigquery-pandas/issues/6)) + +## 0.1.3 / 2017-03-04 + +- Bug with appending to a BigQuery table where fields have modes + (NULLABLE,REQUIRED,REPEATED) specified. These modes were compared + versus the remote schema and writing a table via `to_gbq` would + previously raise. + ([#13](https://github.com/googleapis/python-bigquery-pandas/issues/13)) + +## 0.1.2 / 2017-02-23 + +Initial release of transfered code from +[pandas](https://github.com/pandas-dev/pandas) + +Includes patches since the 0.19.2 release on pandas with the following: + +- `read_gbq` now allows query configuration preferences + [pandas-GH#14742](https://github.com/pandas-dev/pandas/pull/14742) +- `read_gbq` now stores `INTEGER` columns as `dtype=object` if they + contain `NULL` values. Otherwise they are stored as `int64`. This + prevents precision lost for integers greather than 2\**53. + Furthermore \`\`FLOAT\`\` columns with values above 10*\*4 are no + longer casted to `int64` which also caused precision loss + [pandas-GH#14064](https://github.com/pandas-dev/pandas/pull/14064), + and + [pandas-GH#14305](https://github.com/pandas-dev/pandas/pull/14305) diff --git a/convert_changelog.py b/convert_changelog.py new file mode 100644 index 00000000..d1794e7e --- /dev/null +++ b/convert_changelog.py @@ -0,0 +1,9 @@ +import re + + +with open("docs/source/changelog.rst") as f: + text = f.read() + +# :issue:`312` -> https://github.com/googleapis/python-bigquery-pandas/issues/312 +c = re.compile(r":issue:`([0-9]+)`", flags=re.MULTILINE) +print(re.sub(c, r"[#\1](https://github.com/googleapis/python-bigquery-pandas/issues/\1)", text)) diff --git a/docs/source/changelog.rst b/docs/source/changelog.rst index 6af3af75..2abe7235 100644 --- a/docs/source/changelog.rst +++ b/docs/source/changelog.rst @@ -112,6 +112,9 @@ Internal changes 0.12.0 / 2019-11-25 ------------------- +New features +~~~~~~~~~~~~ + - Add ``max_results`` argument to :func:`~pandas_gbq.read_gbq()`. Use this argument to limit the number of rows in the results DataFrame. Set ``max_results`` to 0 to ignore query outputs, such as for DML or DDL @@ -120,6 +123,12 @@ Internal changes this argument to display a progress bar when downloading data. (:issue:`182`) +Bug fixes +~~~~~~~~~ + +- Fix resource leak with ``use_bqstorage_api`` by closing BigQuery Storage + API client after use. (:issue:`294`) + Dependency updates ~~~~~~~~~~~~~~~~~~ From 6b4b5d6efe93e46dfeac918440486bf45067f115 Mon Sep 17 00:00:00 2001 From: Tim Swast Date: Wed, 18 Aug 2021 14:53:16 -0500 Subject: [PATCH 2/3] Use CHANGELOG.md from docs --- .github/PULL_REQUEST_TEMPLATE.md | 1 - docs/requirements-docs.txt | 1 + docs/source/changelog.md | 1 + docs/source/changelog.rst | 436 ------------------------------- docs/source/conf.py | 3 +- docs/source/contributing.rst | 6 +- docs/source/index.rst | 2 +- release-procedure.md | 3 - 8 files changed, 7 insertions(+), 446 deletions(-) create mode 120000 docs/source/changelog.md delete mode 100644 docs/source/changelog.rst diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index e434e5ea..872eb0ff 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -1,4 +1,3 @@ - [ ] closes #xxxx - [ ] tests added / passed - [ ] passes `nox -s blacken lint` -- [ ] `docs/source/changelog.rst` entry \ No newline at end of file diff --git a/docs/requirements-docs.txt b/docs/requirements-docs.txt index afd31d06..af49a246 100644 --- a/docs/requirements-docs.txt +++ b/docs/requirements-docs.txt @@ -1,6 +1,7 @@ ipython matplotlib numpydoc +recommonmark sphinx sphinx_rtd_theme pandas diff --git a/docs/source/changelog.md b/docs/source/changelog.md new file mode 120000 index 00000000..699cc9e7 --- /dev/null +++ b/docs/source/changelog.md @@ -0,0 +1 @@ +../../CHANGELOG.md \ No newline at end of file diff --git a/docs/source/changelog.rst b/docs/source/changelog.rst deleted file mode 100644 index 2abe7235..00000000 --- a/docs/source/changelog.rst +++ /dev/null @@ -1,436 +0,0 @@ -Changelog -========= - -.. _changelog-0.15.0: - -0.15.0 / 2021-03-30 -------------------- - -Features -~~~~~~~~ - -- Load DataFrame with ``to_gbq`` to a table in a project different from the API - client project. Specify the target table ID as ``project.dataset.table`` to - use this feature. (:issue:`321`, :issue:`347`) -- Allow billing project to be separate from destination table project in - ``to_gbq``. (:issue:`321`) - -Bug fixes -~~~~~~~~~ - -- Avoid 403 error from ``to_gbq`` when table has ``policyTags``. (:issue:`354`) -- Avoid ``client.dataset`` deprecation warnings. (:issue:`312`) - -Dependencies -~~~~~~~~~~~~ - -- Drop support for Python 3.5 and 3.6. (:issue:`337`) -- Drop support for `google-cloud-bigquery==2.4.*` due to query hanging bug. - (:issue:`343`) - - -.. _changelog-0.14.1: - -0.14.1 / 2020-11-10 -------------------- - -Bug fixes -~~~~~~~~~ - -- Use ``object`` dtype for ``TIME`` columns. (:issue:`328`) -- Encode floating point values with greater precision. (:issue:`326`) -- Support ``INT64`` and other standard SQL aliases in - :func:`~pandas_gbq.to_gbq` ``table_schema`` argument. (:issue:`322`) - - -.. _changelog-0.14.0: - -0.14.0 / 2020-10-05 -------------------- - -- Add ``dtypes`` argument to ``read_gbq``. Use this argument to override the - default ``dtype`` for a particular column in the query results. For - example, this can be used to select nullable integer columns as the - ``Int64`` nullable integer pandas extension type. (:issue:`242`, - :issue:`332`) - -.. code-block:: python - - df = gbq.read_gbq( - "SELECT CAST(NULL AS INT64) AS null_integer", - dtypes={"null_integer": "Int64"}, - ) - -Dependency updates -~~~~~~~~~~~~~~~~~~ - -- Support ``google-cloud-bigquery-storage`` 2.0 and higher. (:issue:`329`) -- Update the minimum version of ``pandas`` to 0.20.1. - (:issue:`331`) - -Internal changes -~~~~~~~~~~~~~~~~ - -- Update tests to run against Python 3.8. (:issue:`331`) - - -.. _changelog-0.13.3: - -0.13.3 / 2020-09-30 -------------------- - -- Include needed "extras" from ``google-cloud-bigquery`` package as - dependencies. Exclude incompatible 2.0 version. (:issue:`324`, :issue:`329`) - -.. _changelog-0.13.2: - -0.13.2 / 2020-05-14 -------------------- - -- Fix ``Provided Schema does not match Table`` error when the existing table - contains required fields. (:issue:`315`) - -.. _changelog-0.13.1: - -0.13.1 / 2020-02-13 -------------------- - -- Fix ``AttributeError`` with BQ Storage API to download empty results. - (:issue:`299`) - -.. _changelog-0.13.0: - -0.13.0 / 2019-12-12 -------------------- - -- Raise ``NotImplementedError`` when the deprecated ``private_key`` argument - is used. (:issue:`301`) - - -.. _changelog-0.12.0: - -0.12.0 / 2019-11-25 -------------------- - -New features -~~~~~~~~~~~~ - -- Add ``max_results`` argument to :func:`~pandas_gbq.read_gbq()`. Use this - argument to limit the number of rows in the results DataFrame. Set - ``max_results`` to 0 to ignore query outputs, such as for DML or DDL - queries. (:issue:`102`) -- Add ``progress_bar_type`` argument to :func:`~pandas_gbq.read_gbq()`. Use - this argument to display a progress bar when downloading data. - (:issue:`182`) - -Bug fixes -~~~~~~~~~ - -- Fix resource leak with ``use_bqstorage_api`` by closing BigQuery Storage - API client after use. (:issue:`294`) - -Dependency updates -~~~~~~~~~~~~~~~~~~ - -- Update the minimum version of ``google-cloud-bigquery`` to 1.11.1. - (:issue:`296`) - -Documentation -~~~~~~~~~~~~~ - -- Add code samples to introduction and refactor howto guides. (:issue:`239`) - - -.. _changelog-0.11.0: - -0.11.0 / 2019-07-29 -------------------- - -- **Breaking Change:** Python 2 support has been dropped. This is to align - with the pandas package which dropped Python 2 support at the end of 2019. - (:issue:`268`) - -Enhancements -~~~~~~~~~~~~ - -- Ensure ``table_schema`` argument is not modified inplace. (:issue:`278`) - -Implementation changes -~~~~~~~~~~~~~~~~~~~~~~ - -- Use object dtype for ``STRING``, ``ARRAY``, and ``STRUCT`` columns when - there are zero rows. (:issue:`285`) - -Internal changes -~~~~~~~~~~~~~~~~ - -- Populate ``user-agent`` with ``pandas`` version information. (:issue:`281`) -- Fix ``pytest.raises`` usage for latest pytest. Fix warnings in tests. - (:issue:`282`) -- Update CI to install nightly packages in the conda tests. (:issue:`254`) - -.. _changelog-0.10.0: - -0.10.0 / 2019-04-05 -------------------- - -- **Breaking Change:** Default SQL dialect is now ``standard``. Use - :attr:`pandas_gbq.context.dialect` to override the default value. - (:issue:`195`, :issue:`245`) - -Documentation -~~~~~~~~~~~~~ - -- Document :ref:`BigQuery data type to pandas dtype conversion - ` for ``read_gbq``. (:issue:`269`) - -Dependency updates -~~~~~~~~~~~~~~~~~~ - -- Update the minimum version of ``google-cloud-bigquery`` to 1.9.0. - (:issue:`247`) -- Update the minimum version of ``pandas`` to 0.19.0. (:issue:`262`) - -Internal changes -~~~~~~~~~~~~~~~~ - -- Update the authentication credentials. **Note:** You may need to set - ``reauth=True`` in order to update your credentials to the most recent - version. This is required to use new functionality such as the BigQuery - Storage API. (:issue:`267`) -- Use ``to_dataframe()`` from ``google-cloud-bigquery`` in the ``read_gbq()`` - function. (:issue:`247`) - -Enhancements -~~~~~~~~~~~~ - -- Fix a bug where pandas-gbq could not upload an empty DataFrame. (:issue:`237`) -- Allow ``table_schema`` in :func:`to_gbq` to contain only a subset of columns, - with the rest being populated using the DataFrame dtypes (:issue:`218`) - (contributed by @johnpaton) -- Read ``project_id`` in :func:`to_gbq` from provided ``credentials`` if - available (contributed by @daureg) -- ``read_gbq`` uses the timezone-aware ``DatetimeTZDtype(unit='ns', - tz='UTC')`` dtype for BigQuery ``TIMESTAMP`` columns. (:issue:`269`) -- Add ``use_bqstorage_api`` to :func:`read_gbq`. The BigQuery Storage API can - be used to download large query results (>125 MB) more quickly. If the BQ - Storage API can't be used, the BigQuery API is used instead. (:issue:`133`, - :issue:`270`) - -.. _changelog-0.9.0: - -0.9.0 / 2019-01-11 ------------------- - -- Warn when deprecated ``private_key`` parameter is used (:issue:`240`) -- **New dependency** Use the ``pydata-google-auth`` package for - authentication. (:issue:`241`) - -.. _changelog-0.8.0: - -0.8.0 / 2018-11-12 ------------------- - -Breaking changes -~~~~~~~~~~~~~~~~ - -- **Deprecate** ``private_key`` parameter to :func:`pandas_gbq.read_gbq` and - :func:`pandas_gbq.to_gbq` in favor of new ``credentials`` argument. Instead, - create a credentials object using - :func:`google.oauth2.service_account.Credentials.from_service_account_info` - or - :func:`google.oauth2.service_account.Credentials.from_service_account_file`. - See the :doc:`authentication how-to guide ` for - examples. (:issue:`161`, :issue:`231`) - -Enhancements -~~~~~~~~~~~~ - -- Allow newlines in data passed to ``to_gbq``. (:issue:`180`) -- Add :attr:`pandas_gbq.context.dialect` to allow overriding the default SQL - syntax dialect. (:issue:`195`, :issue:`235`) -- Support Python 3.7. (:issue:`197`, :issue:`232`) - -Internal changes -~~~~~~~~~~~~~~~~ - -- Migrate tests to CircleCI. (:issue:`228`, :issue:`232`) - -.. _changelog-0.7.0: - -0.7.0 / 2018-10-19 --------------------- - -- `int` columns which contain `NULL` are now cast to `float`, rather than - `object` type. (:issue:`174`) -- `DATE`, `DATETIME` and `TIMESTAMP` columns are now parsed as pandas' `timestamp` - objects (:issue:`224`) -- Add :class:`pandas_gbq.Context` to cache credentials in-memory, across - calls to ``read_gbq`` and ``to_gbq``. (:issue:`198`, :issue:`208`) -- Fast queries now do not log above ``DEBUG`` level. (:issue:`204`) - With BigQuery's release of `clustering `__ - querying smaller samples of data is now faster and cheaper. -- Don't load credentials from disk if reauth is ``True``. (:issue:`212`) - This fixes a bug where pandas-gbq could not refresh credentials if the - cached credentials were invalid, revoked, or expired, even when - ``reauth=True``. -- Catch RefreshError when trying credentials. (:issue:`226`) - -Internal changes -~~~~~~~~~~~~~~~~ - -- Avoid listing datasets and tables in system tests. (:issue:`215`) -- Improved performance from eliminating some duplicative parsing steps - (:issue:`224`) - -.. _changelog-0.6.1: - -0.6.1 / 2018-09-11 --------------------- - -- Improved ``read_gbq`` performance and memory consumption by delegating - ``DataFrame`` construction to the Pandas library, radically reducing - the number of loops that execute in python - (:issue:`128`) -- Reduced verbosity of logging from ``read_gbq``, particularly for short - queries. (:issue:`201`) -- Avoid ``SELECT 1`` query when running ``to_gbq``. (:issue:`202`) - -.. _changelog-0.6.0: - -0.6.0 / 2018-08-15 --------------------- - -- Warn when ``dialect`` is not passed in to ``read_gbq``. The default dialect - will be changing from 'legacy' to 'standard' in a future version. - (:issue:`195`) -- Use general float with 15 decimal digit precision when writing to local - CSV buffer in ``to_gbq``. This prevents numerical overflow in certain - edge cases. (:issue:`192`) - -.. _changelog-0.5.0: - -0.5.0 / 2018-06-15 ------------------- - -- Project ID parameter is optional in ``read_gbq`` and ``to_gbq`` when it can - inferred from the environment. Note: you must still pass in a project ID when - using user-based authentication. (:issue:`103`) -- Progress bar added for ``to_gbq``, through an optional library `tqdm` as - dependency. (:issue:`162`) -- Add location parameter to ``read_gbq`` and ``to_gbq`` so that pandas-gbq - can work with datasets in the Tokyo region. (:issue:`177`) - -Documentation -~~~~~~~~~~~~~ - -- Add :doc:`authentication how-to guide `. (:issue:`183`) -- Update :doc:`contributing` guide with new paths to tests. (:issue:`154`, - :issue:`164`) - -Internal changes -~~~~~~~~~~~~~~~~ - -- Tests now use `nox` to run in multiple Python environments. (:issue:`52`) -- Renamed internal modules. (:issue:`154`) -- Refactored auth to an internal auth module. (:issue:`176`) -- Add unit tests for ``get_credentials()``. (:issue:`184`) - -.. _changelog-0.4.1: - -0.4.1 / 2018-04-05 ------------------- - -- Only show ``verbose`` deprecation warning if Pandas version does not - populate it. (:issue:`157`) - -.. _changelog-0.4.0: - -0.4.0 / 2018-04-03 ------------------- - -- Fix bug in `read_gbq` when building a dataframe with integer columns - on Windows. Explicitly use 64bit integers when converting from BQ types. - (:issue:`119`) -- Fix bug in `read_gbq` when querying for an array of floats (:issue:`123`) -- Fix bug in `read_gbq` with configuration argument. Updates `read_gbq` to - account for breaking change in the way ``google-cloud-python`` version - 0.32.0+ handles query configuration API representation. (:issue:`152`) -- Fix bug in `to_gbq` where seconds were discarded in timestamp columns. - (:issue:`148`) -- Fix bug in `to_gbq` when supplying a user-defined schema (:issue:`150`) -- **Deprecate** the ``verbose`` parameter in `read_gbq` and `to_gbq`. - Messages use the logging module instead of printing progress directly to - standard output. (:issue:`12`) - -.. _changelog-0.3.1: - -0.3.1 / 2018-02-13 ------------------- - -- Fix an issue where Unicode couldn't be uploaded in Python 2 (:issue:`106`) -- Add support for a passed schema in :func:``to_gbq`` instead inferring the schema from the passed ``DataFrame`` with ``DataFrame.dtypes`` (:issue:`46`) -- Fix an issue where a dataframe containing both integer and floating point columns could not be uploaded with ``to_gbq`` (:issue:`116`) -- ``to_gbq`` now uses ``to_csv`` to avoid manually looping over rows in a dataframe (should result in faster table uploads) (:issue:`96`) - -.. _changelog-0.3.0: - -0.3.0 / 2018-01-03 ------------------- - -- Use the `google-cloud-bigquery `__ library for API calls. The ``google-cloud-bigquery`` package is a new dependency, and dependencies on ``google-api-python-client`` and ``httplib2`` are removed. See the `installation guide `__ for more details. (:issue:`93`) -- Structs and arrays are now named properly (:issue:`23`) and BigQuery functions like ``array_agg`` no longer run into errors during type conversion (:issue:`22`). -- :func:`to_gbq` now uses a load job instead of the streaming API. Remove ``StreamingInsertError`` class, as it is no longer used by :func:`to_gbq`. (:issue:`7`, :issue:`75`) - -.. _changelog-0.2.1: - -0.2.1 / 2017-11-27 ------------------- - -- :func:`read_gbq` now raises ``QueryTimeout`` if the request exceeds the ``query.timeoutMs`` value specified in the BigQuery configuration. (:issue:`76`) -- Environment variable ``PANDAS_GBQ_CREDENTIALS_FILE`` can now be used to override the default location where the BigQuery user account credentials are stored. (:issue:`86`) -- BigQuery user account credentials are now stored in an application-specific hidden user folder on the operating system. (:issue:`41`) - -.. _changelog-0.2.0: - -0.2.0 / 2017-07-24 ------------------- - -- Drop support for Python 3.4 (:issue:`40`) -- The dataframe passed to ```.to_gbq(...., if_exists='append')``` needs to contain only a subset of the fields in the BigQuery schema. (:issue:`24`) -- Use the `google-auth `__ library for authentication because ``oauth2client`` is deprecated. (:issue:`39`) -- :func:`read_gbq` now has a ``auth_local_webserver`` boolean argument for controlling whether to use web server or console flow when getting user credentials. Replaces `--noauth_local_webserver` command line argument. (:issue:`35`) -- :func:`read_gbq` now displays the BigQuery Job ID and standard price in verbose output. (:issue:`70` and :issue:`71`) - -.. _changelog-0.1.6: - -0.1.6 / 2017-05-03 ------------------- - -- All gbq errors will simply be subclasses of ``ValueError`` and no longer inherit from the deprecated ``PandasError``. - -.. _changelog-0.1.4: - -0.1.4 / 2017-03-17 ------------------- - -- ``InvalidIndexColumn`` will be raised instead of ``InvalidColumnOrder`` in :func:`read_gbq` when the index column specified does not exist in the BigQuery schema. (:issue:`6`) - -.. _changelog-0.1.3: - -0.1.3 / 2017-03-04 ------------------- - -- Bug with appending to a BigQuery table where fields have modes (NULLABLE,REQUIRED,REPEATED) specified. These modes were compared versus the remote schema and writing a table via :func:`to_gbq` would previously raise. (:issue:`13`) - -.. _changelog-0.1.2: - -0.1.2 / 2017-02-23 ------------------- - -Initial release of transfered code from `pandas `__ - -Includes patches since the 0.19.2 release on pandas with the following: - -- :func:`read_gbq` now allows query configuration preferences `pandas-GH#14742 `__ -- :func:`read_gbq` now stores ``INTEGER`` columns as ``dtype=object`` if they contain ``NULL`` values. Otherwise they are stored as ``int64``. This prevents precision lost for integers greather than 2**53. Furthermore ``FLOAT`` columns with values above 10**4 are no longer casted to ``int64`` which also caused precision loss `pandas-GH#14064 `__, and `pandas-GH#14305 `__ diff --git a/docs/source/conf.py b/docs/source/conf.py index afad588d..bfcc94ef 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -49,6 +49,7 @@ "sphinx.ext.intersphinx", "sphinx.ext.coverage", "sphinx.ext.ifconfig", + "recommonmark", ] # Add any paths that contain templates here, relative to this directory. @@ -58,7 +59,7 @@ # You can specify multiple suffix as a list of string: # # source_suffix = ['.rst', '.md'] -source_suffix = ".rst" +source_suffix = [".rst", ".md"] # The encoding of source files. # diff --git a/docs/source/contributing.rst b/docs/source/contributing.rst index cacbf1c4..3bd86849 100644 --- a/docs/source/contributing.rst +++ b/docs/source/contributing.rst @@ -334,10 +334,8 @@ run gbq integration tests on a forked repository: Documenting your code --------------------- -Changes should be reflected in the release notes located in ``doc/source/changelog.rst``. -This file contains an ongoing change log. Add an entry to this file to document your fix, -enhancement or (unavoidable) breaking change. Make sure to include the GitHub issue number -when adding your entry (using `` :issue:`1234` `` where `1234` is the issue/pull request number). +Changes should follow convential commits. The release-please bot uses the +commit message to create an ongoing change log. If your code is an enhancement, it is most likely necessary to add usage examples to the existing documentation. Further, to let users know when diff --git a/docs/source/index.rst b/docs/source/index.rst index bfb51d9e..e104127d 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -39,7 +39,7 @@ Contents: writing.rst api.rst contributing.rst - changelog.rst + changelog.md privacy.rst diff --git a/release-procedure.md b/release-procedure.md index b682db2f..3b33021d 100644 --- a/release-procedure.md +++ b/release-procedure.md @@ -1,8 +1,6 @@ * Send PR to prepare release on scheduled date. - * Add current date and any missing changes to [`docs/source/changelog.rst`](https://github.com/pydata/pandas-gbq/blob/master/docs/source/changelog.rst). - * Verify your local repository is on the latest changes. `rebase -i` should be noop. git fetch pandas-gbq master @@ -37,7 +35,6 @@ * Create the [release on GitHub](https://github.com/pydata/pandas-gbq/releases/new) using the tag created earlier. - * Copy release notes from [changelog.rst](https://github.com/pydata/pandas-gbq/blob/master/docs/source/changelog.rst). * Upload wheel and source zip from `dist/` directory. * Do a pull-request to the feedstock on `pandas-gbq-feedstock `__ From b6295ada491ec2862f63a7493b8489e511a70636 Mon Sep 17 00:00:00 2001 From: Tim Swast Date: Wed, 18 Aug 2021 16:03:43 -0500 Subject: [PATCH 3/3] remove one-time script --- convert_changelog.py | 9 --------- 1 file changed, 9 deletions(-) delete mode 100644 convert_changelog.py diff --git a/convert_changelog.py b/convert_changelog.py deleted file mode 100644 index d1794e7e..00000000 --- a/convert_changelog.py +++ /dev/null @@ -1,9 +0,0 @@ -import re - - -with open("docs/source/changelog.rst") as f: - text = f.read() - -# :issue:`312` -> https://github.com/googleapis/python-bigquery-pandas/issues/312 -c = re.compile(r":issue:`([0-9]+)`", flags=re.MULTILINE) -print(re.sub(c, r"[#\1](https://github.com/googleapis/python-bigquery-pandas/issues/\1)", text))