Skip to content

Do not manually loop over all rows when reading a dataframe #97

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tswast opened this issue Dec 8, 2017 · 2 comments
Closed

Do not manually loop over all rows when reading a dataframe #97

tswast opened this issue Dec 8, 2017 · 2 comments
Labels
type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@tswast
Copy link
Collaborator

tswast commented Dec 8, 2017

See: #25 (comment)

Perhaps there is a faster way to construct a dataframe from the results returned by the client library than looping over rows individually?

Note: the client library ends up effectively looping over all rows as well by returning an iterator that does the type conversions / parsing over the actual API results. I imagine some profiling might reveal places where the performance there can also be improved.

P.S. version 0.29.0 of the BigQuery client library (not yet released, as of 2017-12-08) will expose a to_dataframe() method. The actual implementation of this issue may be to just use that method here.

https://github.com/GoogleCloudPlatform/google-cloud-python/blob/061011d0213f82ca5ccaa9dec0a12713faaa2899/bigquery/google/cloud/bigquery/table.py#L1103-L1123

@tswast tswast added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Dec 8, 2017
@tswast
Copy link
Collaborator Author

tswast commented Jan 26, 2018

Note: I tried out using to_dataframe() in #112, but there are some issues with indexes, which aren't handled in the google-cloud-bigquery library. More investigation is needed.

@tswast
Copy link
Collaborator Author

tswast commented Feb 12, 2018

Closing this issue as a duplicate of #66, which is to improve the performance of reads in general.

@tswast tswast closed this as completed Feb 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

1 participant