-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
REG: Fix read_parquet from file-like objects #34500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 11 commits
06c2696
03179ea
3f1496b
8122015
8cdf763
daeb150
ee32b3d
9fa3178
6ee9974
92a883d
5a15f4f
882f5a8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
""" test parquet compat """ | ||
import datetime | ||
from distutils.version import LooseVersion | ||
from io import BytesIO | ||
import os | ||
from warnings import catch_warnings | ||
|
||
|
@@ -567,6 +568,24 @@ def test_s3_roundtrip_for_dir(self, df_compat, s3_resource, pa, partition_col): | |
repeat=1, | ||
) | ||
|
||
@tm.network | ||
@td.skip_if_no("pyarrow") | ||
def test_parquet_read_from_url(self, df_compat): | ||
# TODO:alimcmaster1 update with master URL | ||
url = ( | ||
"https://raw.githubusercontent.com/alimcmaster1/pandas/" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This might fail due to rate limits from github? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep fair point - we already do this in test_network.py and I think the decorator helps handle any failures. We could use https://pypi.org/project/pytest-localserver/ ? also I couldnt find docs that suggest what the rate limits are for raw.githubusercontent endpoints? |
||
"mcmali-parq-fix/pandas/tests/io/data/parquet/simple.parquet" | ||
) | ||
df = pd.read_parquet(url) | ||
tm.assert_frame_equal(df, df_compat) | ||
|
||
@td.skip_if_no("pyarrow") | ||
def test_read_file_like_obj_support(self, df_compat): | ||
buffer = BytesIO() | ||
df_compat.to_parquet(buffer) | ||
df_from_buf = pd.read_parquet(buffer) | ||
tm.assert_frame_equal(df_compat, df_from_buf) | ||
|
||
def test_partition_cols_supported(self, pa, df_full): | ||
# GH #23283 | ||
partition_cols = ["bool", "int"] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have some similar logic on the fastparquet side. Should consolidate in the future: https://github.com/pandas-dev/pandas/blob/master/pandas/io/parquet.py#L188