ERROR: file not found: pyarrow and to_parquet() not working #9326

EnvironmentalEngineer · 2021-01-26T08:22:18Z

Hi,

I am using pyarrow 1.0.0.

It failed to process integers and floats but worked with objects. When I run the following python script:
sample = pd.DataFrame({'a':[1, 2], 'b': [3, 4]})
sample.to_parquet('tmp.parquet')

I got:
ArrowTypeError: ('Did not pass numpy.dtype object', 'Conversion failed for column a with type int64')

When I run pytest pyarrow, I got file not found error:
============================= test session starts ==============================
platform linux -- Python 3.7.3, pytest-4.3.1, py-1.8.0, pluggy-0.9.0
rootdir: /home/ubuntu, inifile:
plugins: remotedata-0.3.1, openfiles-0.3.2, doctestplus-0.3.0, arraydiff-0.3
collecting ...
========================= no tests ran in 0.00 seconds =========================
ERROR: file not found: pyarrow

And the package is installed because when I run pip3.7 install --no-cache pyarrow, I got:
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pyarrow in ./anaconda3/lib/python3.7/site-packages (1.0.0)
Requirement already satisfied: numpy>=1.14 in ./anaconda3/lib/python3.7/site-packages (from pyarrow) (1.20.0rc2)

Here is what I got if I run pd.show_versions():

INSTALLED VERSIONS

commit : 9d598a5e1eee26df95b3910e3f2934890d062caa
python : 3.7.3.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-1035-aws
Version : #37~18.04.1-Ubuntu SMP Wed Jan 6 22:31:04 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.1
numpy : 1.20.0rc2
pytz : 2018.9
dateutil : 2.8.0
pip : 21.0
setuptools : 52.0.0
Cython : 0.29.6
pytest : 4.3.1
hypothesis : None
sphinx : 1.8.5
blosc : None
feather : None
xlsxwriter : 1.1.5
lxml.etree : 4.3.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : 7.4.0
pandas_datareader: None
bs4 : 4.7.1
bottleneck : 1.2.1
fsspec : 0.8.3
fastparquet : None
gcsfs : None
matplotlib : 3.0.3
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.1
pandas_gbq : None
pyarrow : 1.0.0
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.1
tables : 3.5.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.43.1

Could anyone please help me with this? Thanks!

jorisvandenbossche · 2021-01-28T09:18:35Z

Already answered at pandas-dev/pandas#39411

jorisvandenbossche closed this as completed Jan 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ERROR: file not found: pyarrow and to_parquet() not working #9326

ERROR: file not found: pyarrow and to_parquet() not working #9326

EnvironmentalEngineer commented Jan 26, 2021

jorisvandenbossche commented Jan 28, 2021

Uh oh!

ERROR: file not found: pyarrow and to_parquet() not working #9326

ERROR: file not found: pyarrow and to_parquet() not working #9326

Comments

EnvironmentalEngineer commented Jan 26, 2021

INSTALLED VERSIONS

jorisvandenbossche commented Jan 28, 2021

Uh oh!