From 73d373cd1c8aaed5c518b816d38aa7d7625e0330 Mon Sep 17 00:00:00 2001 From: Tim Swast Date: Mon, 28 Mar 2022 17:05:42 -0500 Subject: [PATCH 1/2] doc: add ROADMAP document describing the purpose of the package --- ROADMAP.md | 10 ++++++++++ 1 file changed, 10 insertions(+) create mode 100644 ROADMAP.md diff --git a/ROADMAP.md b/ROADMAP.md new file mode 100644 index 00000000..f8915dd2 --- /dev/null +++ b/ROADMAP.md @@ -0,0 +1,10 @@ +# pandas-gbq Roadmap + +The purpose of this package is to provide a small subset of BigQuery +functionality that maps well to +[pandas.read_gbq](https://pandas.pydata.org/docs/reference/api/pandas.read_gbq.html#pandas.read_gbq) +and +[pandas.DataFrame.to_gbq](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_gbq.html#pandas.DataFrame.to_gbq). + +A note on pandas.read_sql: we'd like to be compatible with this too, for folks +that need better performance compared to the SQLAlchemy connector. From db95a899dcf5fb23b5cbc47b1ba5048f45b39353 Mon Sep 17 00:00:00 2001 From: Tim Swast Date: Wed, 30 Mar 2022 15:57:27 -0500 Subject: [PATCH 2/2] additional thoughts --- ROADMAP.md | 49 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 48 insertions(+), 1 deletion(-) diff --git a/ROADMAP.md b/ROADMAP.md index f8915dd2..d4053cfa 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -5,6 +5,53 @@ functionality that maps well to [pandas.read_gbq](https://pandas.pydata.org/docs/reference/api/pandas.read_gbq.html#pandas.read_gbq) and [pandas.DataFrame.to_gbq](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_gbq.html#pandas.DataFrame.to_gbq). +Those methods in the pandas library are a thin wrapper to the equivalent +methods in this package. -A note on pandas.read_sql: we'd like to be compatible with this too, for folks +## Adding features to pandas-gbq + +Considerations when adding new features to pandas-gbq: + +* New method? Consider an alternative, as the core focus of this library is + `read_gbq` and `to_gbq`. +* Breaking change to an existing parameter? Consider an alternative, as folks + could be using an older version of `pandas` that doesn't account for the + change when a newer version of `pandas-gbq` is installed. If you must, please + follow a 1+ year deprecation timeline. +* New parameter? Go for it! Be sure to also send a PR to `pandas` after the + feature is released so that folks using the `pandas` wrapper can take + advantage of it. +* New data type? OK. If there's not a good mapping to an existing `pandas` + dtype, consider adding one to the `db-dtypes` package. + +## Vision + +The `pandas-gbq` package should do the "right thing" by default. This means you +should carefully choose dtypes for maximum compatibility with BigQuery and +avoid data loss. As new data types are added to BigQuery that don't have good +equivalents yet in the `pandas` ecosystem, equivalent dtypes should be added to +the `db-dtypes` package. + +As new features are added that might improve performance, `pandas-gbq` should +offer easy ways to use them without sacrificing usability. For example, one +might consider using the `api_method` parameter of `to_gbq` to support the +BigQuery Storage Write API. + +A note on `pandas.read_sql`: we'd like to be compatible with this too, for folks that need better performance compared to the SQLAlchemy connector. + +## Usability + +Unlike the more object-oriented client-libraries, it's natural to have a method +with many parameters in the Python data science ecosystem. That said, the +`configuration` argument is provided, which takes the REST representation of +the job configuration so that power users can use new features without the need +for an explicit parameter being added. + +## Conclusion + +Keep it simple. + +Don't break existing users. + +Do the right thing by default.