Skip to content

Dependency on pyiceberg_core while still marked as Optional #1987

@andersbogsnes

Description

@andersbogsnes

Apache Iceberg version

0.9.0 (latest release)

Please describe the bug 🐞

(I'm on 0.9.1 but the dropdown is missing that one)

Hi, I was trying to add partition transforms to an Iceberg table, but I get a ModuleNotFoundError: No module named 'pyiceberg_core' when I try to insert data after updating the transforms. The full traceback is below.

For reference, I install pyiceberg as pyiceberg[snappy,s3fs]. Looking through the pyproject.toml, pyiceberg_core is listed as an optional dependency, but I'm guessing it's now being relied on in the .append method.

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[7], line 1
----> 1 house_prices_t.append(df.to_arrow().cast(house_prices_schema.as_arrow()))

File [/usr/local/lib/python3.12/site-packages/pyiceberg/table/__init__.py:1229](http://localhost:8080/usr/local/lib/python3.12/site-packages/pyiceberg/table/__init__.py#line=1228), in Table.append(self, df, snapshot_properties)
   1221 """
   1222 Shorthand API for appending a PyArrow table to the table.
   1223 
   (...)   1226     snapshot_properties: Custom properties to be added to the snapshot summary
   1227 """
   1228 with self.transaction() as tx:
-> 1229     tx.append(df=df, snapshot_properties=snapshot_properties)

File [/usr/local/lib/python3.12/site-packages/pyiceberg/table/__init__.py:473](http://localhost:8080/usr/local/lib/python3.12/site-packages/pyiceberg/table/__init__.py#line=472), in Transaction.append(self, df, snapshot_properties)
    470 with self._append_snapshot_producer(snapshot_properties) as append_files:
    471     # skip writing data files if the dataframe is empty
    472     if df.shape[0] > 0:
--> 473         data_files = list(
    474             _dataframe_to_data_files(
    475                 table_metadata=self.table_metadata, write_uuid=append_files.commit_uuid, df=df, io=self._table.io
    476             )
    477         )
    478         for data_file in data_files:
    479             append_files.append_data_file(data_file)

File [/usr/local/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py:2601](http://localhost:8080/usr/local/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py#line=2600), in _dataframe_to_data_files(table_metadata, df, io, write_uuid, counter)
   2590     yield from write_file(
   2591         io=io,
   2592         table_metadata=table_metadata,
   (...)   2598         ),
   2599     )
   2600 else:
-> 2601     partitions = _determine_partitions(spec=table_metadata.spec(), schema=table_metadata.schema(), arrow_table=df)
   2602     yield from write_file(
   2603         io=io,
   2604         table_metadata=table_metadata,
   (...)   2617         ),
   2618     )

File [/usr/local/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py:2648](http://localhost:8080/usr/local/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py#line=2647), in _determine_partitions(spec, schema, arrow_table)
   2645 for partition, name in zip(spec.fields, partition_fields):
   2646     source_field = schema.find_field(partition.source_id)
   2647     arrow_table = arrow_table.append_column(
-> 2648         name, partition.transform.pyarrow_transform(source_field.field_type)(arrow_table[source_field.name])
   2649     )
   2651 unique_partition_fields = arrow_table.select(partition_fields).group_by(partition_fields).aggregate([])
   2653 table_partitions = []

File [/usr/local/lib/python3.12/site-packages/pyiceberg/transforms.py:360](http://localhost:8080/usr/local/lib/python3.12/site-packages/pyiceberg/transforms.py#line=359), in BucketTransform.pyarrow_transform(self, source)
    359 def pyarrow_transform(self, source: IcebergType) -> "Callable[[pa.Array], pa.Array]":
--> 360     from pyiceberg_core import transform as pyiceberg_core_transform
    362     return self._pyiceberg_transform_wrapper(pyiceberg_core_transform.bucket, self._num_buckets)

ModuleNotFoundError: No module named 'pyiceberg_core'

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions