Skip to content

error when from datafusion import SessionContext #830

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
l1t1 opened this issue Aug 22, 2024 · 4 comments
Closed

error when from datafusion import SessionContext #830

l1t1 opened this issue Aug 22, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@l1t1
Copy link

l1t1 commented Aug 22, 2024

Describe the bug
when import


>>> from datafusion import SessionContext
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Python38\lib\site-packages\datafusion\__init__.py", line 29, in <module>
    from .context import (
  File "D:\Python38\lib\site-packages\datafusion\context.py", line 22, in <module>
    from ._internal import SessionConfig as SessionConfigInternal
ImportError: DLL load failed while importing _internal: 找不到指定的程序。

To Reproduce

pip install datafusion -U
python
from datafusion import SessionContext

Expected behavior
it works as old version, such as version 36.0 did

>>> from datafusion import SessionContext
>>> ctx = SessionContext()
>>> ctx.register_parquet("taxi", "d:/yellow_tripdata_2022-01.parquet")
>>> x="select passenger_count, count(*) from taxi where passenger_count is not null group by passenger_count order by passenger_count"
>>> df = ctx.sql(x)
>>> df
DataFrame()
+-----------------+----------+
| passenger_count | COUNT(*) |
+-----------------+----------+
| 0.0             | 52061    |
| 1.0             | 1794055  |
| 2.0             | 343026   |
| 3.0             | 84570    |
| 4.0             | 35321    |
| 5.0             | 51338    |
| 6.0             | 32037    |
| 7.0             | 9        |
| 8.0             | 8        |
| 9.0             | 3        |
+-----------------+----------+

Additional context
my os is windows 7
my python version

Python 3.8.8 (tags/v3.8.8:024d805, Feb 19 2021, 13:18:16) [MSC v.1928 64 bit (AMD64)] on win32
@l1t1 l1t1 added the bug Something isn't working label Aug 22, 2024
@Michael-J-Ward
Copy link
Contributor

Michael-J-Ward commented Aug 22, 2024

So, I don't have a Windows machine but I did encounter a similar linux error when my environment couldn't load a shared C++ library that pyarrow needed (see bottom).

  • Could you provide the output for checking pyarrow and installing as I have below?
  • Is this the same machine / environment that you previously used with version 36.0.0?

Successful install and import

Checking pyarrow

❯ python
Python 3.12.4 (main, Jun  6 2024, 18:26:44) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
>>> pyarrow.__version__
'17.0.0'

Installing

pip install datafusion -U
Collecting datafusion
  Using cached datafusion-40.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB)
Requirement already satisfied: pyarrow>=11.0.0 in /nix/store/3zsajax8hkvl1yc9fygpjn702m2qwh7m-python3.12-pyarrow-17.0.0/lib/python3.12/site-packages (from datafusion) (17.0.0)
Collecting typing-extensions (from datafusion)
  Using cached typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Requirement already satisfied: numpy>=1.16.6 in /nix/store/5qnnxrlcfiiv9b84cj1n02gnfq2hbsp4-python3.12-numpy-1.26.4/lib/python3.12/site-packages (from pyarrow>=11.0.0->datafusion) (1.26.4)
Using cached datafusion-40.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.5 MB)
Using cached typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Installing collected packages: typing-extensions, datafusion
Successfully installed datafusion-40.1.0 typing-extensions-4.12.2

Running

python
Python 3.12.4 (main, Jun  6 2024, 18:26:44) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from datafusion import SessionContext
>>> ctx = SessionContext()

Failed because my dev-env wasn't setup properly

python
Python 3.12.4 (main, Jun  6 2024, 18:26:44) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from datafusion import SessionContext
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mike/workspace/rust-python-coverage/.venv/lib/python3.12/site-packages/datafusion/__init__.py", line 29, in <module>
    from .context import (
  File "/home/mike/workspace/rust-python-coverage/.venv/lib/python3.12/site-packages/datafusion/context.py", line 30, in <module>
    from datafusion.dataframe import DataFrame
  File "/home/mike/workspace/rust-python-coverage/.venv/lib/python3.12/site-packages/datafusion/dataframe.py", line 35, in <module>
    from datafusion.expr import Expr
  File "/home/mike/workspace/rust-python-coverage/.venv/lib/python3.12/site-packages/datafusion/expr.py", line 28, in <module>
    import pyarrow as pa
  File "/home/mike/workspace/rust-python-coverage/.venv/lib/python3.12/site-packages/pyarrow/__init__.py", line 65, in <module>
    import pyarrow.lib as _lib
ImportError: libstdc++.so.6: cannot open shared object file: No such file or directory

@l1t1
Copy link
Author

l1t1 commented Aug 22, 2024

Checking pyarrow

>>> import pyarrow
>>> pyarrow.__version__
'15.0.0'

Installing

pip install datafusion -U
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: datafusion in d:\python38\lib\site-packages (36.0.0)
Collecting datafusion
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/9f/12/3a6bf3baa1759315f14cbfb8006efee2ae8971c378e7363732342ff58417/datafusion-40.1.0-cp38-abi3-win_amd64.whl (17.9 MB)
     ---------------------------------------- 17.9/17.9 MB 2.3 MB/s eta 0:00:00
Requirement already satisfied: pyarrow>=11.0.0 in d:\python38\lib\site-packages (from datafusion) (15.0.0)
Requirement already satisfied: typing-extensions in d:\python38\lib\site-packages (from datafusion) (4.10.0)
Requirement already satisfied: numpy<2,>=1.16.6 in d:\python38\lib\site-packages (from pyarrow>=11.0.0->datafusion) (1.21.0)
Installing collected packages: datafusion
  Attempting uninstall: datafusion
    Found existing installation: datafusion 36.0.0
    Uninstalling datafusion-36.0.0:
      Successfully uninstalled datafusion-36.0.0
Successfully installed datafusion-40.1.0

the machine / environment is the same of version 36

@Michael-J-Ward
Copy link
Contributor

I know this isn't your exact setup, but I was able to spin a vm up with:

  • Windows 10
  • python 3.11.9

I was able to successfully install with pip install datafusion -U and run

>>> import pyarrow
>>> pyarrow.__version__
'17.0.0'
>>> from datafusion import SessionContext
>>> ctx = SessionContext()

So... I'm at an impasse.

Have you tried a completely fresh new virtual environment?

@l1t1
Copy link
Author

l1t1 commented Aug 23, 2024

thank you.
Maybe related to rust-lang/rust#121317
maybe new rust version didn't support windows 7 /python 3.8
new polars version also import error in this machine.

@l1t1 l1t1 closed this as completed Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants