Description
Code Sample, a copy-pastable example
from io import BytesIO
import numpy as np
import pandas as pd
import pytest
def test_read_parquet_on_stream():
stream = BytesIO()
df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
df.to_parquet(stream)
result = pd.read_parquet(stream)
assert np.array_equal(result.values, df.values)
Problem description
The documentation of pandas.read_parquet()
(see 1) says
Parameters: path: str, path object or file-like object
It is quite handy to be able to use a stream as parameter. It used to work with pandas version 1.0.3, but causes the error below for pandas version 1.0.4.
Expected Output
I expect the test to pass silently, which it does when using pandas == 1.0.3
.
Output
But for pandas == 1.0.4
, the test case fails due to stream
not being a path-like object:
tests/util/test_parquet_util.py:388 (test_read_parquet_on_stream)
def test_read_parquet_on_stream():
stream = BytesIO()
df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
df.to_parquet(stream)result = pd.read_parquet(stream)
tests/util/test_parquet_util.py:393:
/usr/local/lib/python3.7/dist-packages/pandas/io/parquet.py:315: in read_parquet
return impl.read(path, columns=columns, **kwargs)
/usr/local/lib/python3.7/dist-packages/pandas/io/parquet.py:131: in read
path, filesystem=get_fs_for_path(path), **kwargs
/usr/local/lib/python3.7/dist-packages/pyarrow/parquet.py:1019: in init
self.paths = _parse_uri(path_or_paths)
/usr/local/lib/python3.7/dist-packages/pyarrow/parquet.py:49: in _parse_uri
path = _stringify_path(path)
path = <_io.BytesIO object at 0x7f07d0e77530>
def _stringify_path(path): """ Convert *path* to a string or unicode path if possible. """ if isinstance(path, six.string_types): return path # checking whether path implements the filesystem protocol try: return path.__fspath__() # new in python 3.6 except AttributeError: # fallback pathlib ckeck for earlier python versions than 3.6 if _has_pathlib and isinstance(path, pathlib.Path): return str(path)
raise TypeError("not a path-like object")
E TypeError: not a path-like object
/usr/local/lib/python3.7/dist-packages/pyarrow/util.py:84: TypeError
Details
Besides pandas, my requirements.txt
contains the following relevant dependencies:
pyarrow ~= 0.15.1
pytest ~= 5.3.5
pyyaml ~= 5.1.2
My Python versions are 3.6.10 and 3.7.7.