Skip to content

BUG: pandas.read_parquet no longer accepting a file-like object #34826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DavidFarago opened this issue Jun 16, 2020 · 1 comment
Closed

BUG: pandas.read_parquet no longer accepting a file-like object #34826

DavidFarago opened this issue Jun 16, 2020 · 1 comment
Labels

Comments

@DavidFarago
Copy link

DavidFarago commented Jun 16, 2020

Code Sample, a copy-pastable example

from io import BytesIO
import numpy as np
import pandas as pd
import pytest

def test_read_parquet_on_stream():
    stream = BytesIO()
    df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
    df.to_parquet(stream)
    result = pd.read_parquet(stream)
    assert np.array_equal(result.values, df.values)

Problem description

The documentation of pandas.read_parquet() (see 1) says

Parameters: path: str, path object or file-like object

It is quite handy to be able to use a stream as parameter. It used to work with pandas version 1.0.3, but causes the error below for pandas version 1.0.4.

Expected Output

I expect the test to pass silently, which it does when using pandas == 1.0.3.

Output

But for pandas == 1.0.4, the test case fails due to stream not being a path-like object:

tests/util/test_parquet_util.py:388 (test_read_parquet_on_stream)
def test_read_parquet_on_stream():
stream = BytesIO()
df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
df.to_parquet(stream)

  result = pd.read_parquet(stream)

tests/util/test_parquet_util.py:393:


/usr/local/lib/python3.7/dist-packages/pandas/io/parquet.py:315: in read_parquet
return impl.read(path, columns=columns, **kwargs)
/usr/local/lib/python3.7/dist-packages/pandas/io/parquet.py:131: in read
path, filesystem=get_fs_for_path(path), **kwargs
/usr/local/lib/python3.7/dist-packages/pyarrow/parquet.py:1019: in init
self.paths = _parse_uri(path_or_paths)
/usr/local/lib/python3.7/dist-packages/pyarrow/parquet.py:49: in _parse_uri
path = _stringify_path(path)


path = <_io.BytesIO object at 0x7f07d0e77530>

def _stringify_path(path):
    """
    Convert *path* to a string or unicode path if possible.
    """
    if isinstance(path, six.string_types):
        return path

    # checking whether path implements the filesystem protocol
    try:
        return path.__fspath__()  # new in python 3.6
    except AttributeError:
        # fallback pathlib ckeck for earlier python versions than 3.6
        if _has_pathlib and isinstance(path, pathlib.Path):
            return str(path)
  raise TypeError("not a path-like object")

E TypeError: not a path-like object

/usr/local/lib/python3.7/dist-packages/pyarrow/util.py:84: TypeError

Details

Besides pandas, my requirements.txt contains the following relevant dependencies:

pyarrow ~= 0.15.1
pytest ~= 5.3.5
pyyaml ~= 5.1.2

My Python versions are 3.6.10 and 3.7.7.

@DavidFarago DavidFarago added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 16, 2020
@TomAugspurger
Copy link
Contributor

Duplicate of #34467. Should have 1.0.5 out with a fix later today or tomorrow.

@bashtage bashtage removed the Needs Triage Issue that has not been reviewed by a pandas team member label Aug 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants