Skip to content

BUG: pandas.read_parquet no longer accepting a file-like object #34826

Closed
@DavidFarago

Description

@DavidFarago

Code Sample, a copy-pastable example

from io import BytesIO
import numpy as np
import pandas as pd
import pytest

def test_read_parquet_on_stream():
    stream = BytesIO()
    df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
    df.to_parquet(stream)
    result = pd.read_parquet(stream)
    assert np.array_equal(result.values, df.values)

Problem description

The documentation of pandas.read_parquet() (see 1) says

Parameters: path: str, path object or file-like object

It is quite handy to be able to use a stream as parameter. It used to work with pandas version 1.0.3, but causes the error below for pandas version 1.0.4.

Expected Output

I expect the test to pass silently, which it does when using pandas == 1.0.3.

Output

But for pandas == 1.0.4, the test case fails due to stream not being a path-like object:

tests/util/test_parquet_util.py:388 (test_read_parquet_on_stream)
def test_read_parquet_on_stream():
stream = BytesIO()
df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
df.to_parquet(stream)

  result = pd.read_parquet(stream)

tests/util/test_parquet_util.py:393:


/usr/local/lib/python3.7/dist-packages/pandas/io/parquet.py:315: in read_parquet
return impl.read(path, columns=columns, **kwargs)
/usr/local/lib/python3.7/dist-packages/pandas/io/parquet.py:131: in read
path, filesystem=get_fs_for_path(path), **kwargs
/usr/local/lib/python3.7/dist-packages/pyarrow/parquet.py:1019: in init
self.paths = _parse_uri(path_or_paths)
/usr/local/lib/python3.7/dist-packages/pyarrow/parquet.py:49: in _parse_uri
path = _stringify_path(path)


path = <_io.BytesIO object at 0x7f07d0e77530>

def _stringify_path(path):
    """
    Convert *path* to a string or unicode path if possible.
    """
    if isinstance(path, six.string_types):
        return path

    # checking whether path implements the filesystem protocol
    try:
        return path.__fspath__()  # new in python 3.6
    except AttributeError:
        # fallback pathlib ckeck for earlier python versions than 3.6
        if _has_pathlib and isinstance(path, pathlib.Path):
            return str(path)
  raise TypeError("not a path-like object")

E TypeError: not a path-like object

/usr/local/lib/python3.7/dist-packages/pyarrow/util.py:84: TypeError

Details

Besides pandas, my requirements.txt contains the following relevant dependencies:

pyarrow ~= 0.15.1
pytest ~= 5.3.5
pyyaml ~= 5.1.2

My Python versions are 3.6.10 and 3.7.7.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions