Skip to content

Add support for Bodo DataFrame #2167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

ehsantn
Copy link

@ehsantn ehsantn commented Jul 3, 2025

Rationale for this change

Adds support for Bodo DataFrame library, which is a drop in replacement for Pandas that accelerates and scales Python code automatically by applying query, compiler and HPC optimizations.

Are these changes tested?

Added integration test.

Are there any user-facing changes?

Adds Table.to_bodo() function. Example code:

df = table.to_bodo()  # equivalent to `bodo.pandas.read_iceberg_table(table)`
df = df[df["trip_distance"] >= 10.0]
df = df[["VendorID", "tpep_pickup_datetime", "tpep_dropoff_datetime"]]
print(df)

@ehsantn ehsantn marked this pull request as ready for review July 4, 2025 02:22
Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kevinjqliu
Copy link
Contributor

@ehsantn looks like theres an issue with the dependency resolution

poetry install --all-extras
Installing dependencies from lock file

The current project's supported Python range (3.9.23) is not compatible with some of the required packages Python requirement:
  - numpy requires Python >=3.10, so it will not be installable for Python 3.9.23

Because no versions of pandas match >=1.0.0,<2.3.0 || >2.3.0,<3.0.0
 and pandas (2.3.0) depends on numpy (>=1.22.4), pandas (>=1.0.0,<3.0.0) requires numpy (>=1.22.4).
Because numpy (2.2.6) requires Python >=3.10
 and no versions of numpy match >=1.22.4,<2.2.6 || >2.2.6, numpy is forbidden.
Thus, pandas is forbidden.
So, because pyiceberg depends on pandas (>=1.0.0,<3.0.0), version solving failed.

  * Check your dependencies Python requirement: The Python requirement can be specified via the `python` or `markers` properties

    For numpy, a possible solution would be to set the `python` property to "<empty>"

    https://python-poetry.org/docs/dependency-specification/#python-restricted-dependencies,
    https://python-poetry.org/docs/dependency-specification/#using-environment-markers

make: *** [Makefile:63: install-dependencies] Error 1
Error: Process completed with exit code 2.

@ehsantn
Copy link
Author

ehsantn commented Jul 5, 2025

@kevinjqliu Thanks for the quick review. Bodo requires Python >=3.10 since Python 3.9 has been removed by some dependency packages quite a while ago. Do all optional dependencies of PyIceberg need to support Python 3.9? What do you recommend?
I can try to package Bodo for 3.9 with some workarounds if there is no other solution.

https://numpy.org/neps/nep-0029-deprecation_policy.html#support-table (Python 3.9 is removed since Apr 05, 2024).
https://pypi.org/project/numba (3.10+)

optional = true
python-versions = ">=3.9"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like numpy for 3.9 is removed here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numpy is here: https://github.com/ehsantn/iceberg-python/blob/f36265b8cdc9fa3056ad28784467579514cfc850/poetry.lock#L3424
I'm working on packaging Bodo for Python 3.9 to avoid these Poetry issues: bodo-ai/Bodo#637
Our team will just miss structured pattern matching and better type hints of Python 3.10 :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg 3.9 is also EOL in a few months (2025-10)
https://devguide.python.org/versions/#supported-versions

@ehsantn ehsantn changed the title Added support for Bodo DataFrame Add support for Bodo DataFrame Jul 7, 2025
@ehsantn
Copy link
Author

ehsantn commented Jul 7, 2025

Ok, updated Bodo to support Python 3.9 so this should work now. Tried poetry install --all-extras in an Ubuntu environment and it works.

@kevinjqliu
Copy link
Contributor

@ehsantn i merged a few library upgrades. could you rebase this PR?

@ehsantn
Copy link
Author

ehsantn commented Jul 8, 2025

@ehsantn i merged a few library upgrades. could you rebase this PR?

Done. I assume the CI failure is not related to this PR? The test doesn't seem relevant.

@kevinjqliu
Copy link
Contributor

maybe try rebase main again, idk what CI is doing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants