-
Notifications
You must be signed in to change notification settings - Fork 358
Closed
Description
Apache Iceberg version
0.9.0 (latest release)
Please describe the bug 🐞
(I'm on 0.9.1 but the dropdown is missing that one)
Hi, I was trying to add partition transforms to an Iceberg table, but I get a ModuleNotFoundError: No module named 'pyiceberg_core'
when I try to insert data after updating the transforms. The full traceback is below.
For reference, I install pyiceberg as pyiceberg[snappy,s3fs]
. Looking through the pyproject.toml, pyiceberg_core is listed as an optional dependency, but I'm guessing it's now being relied on in the .append
method.
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[7], line 1
----> 1 house_prices_t.append(df.to_arrow().cast(house_prices_schema.as_arrow()))
File [/usr/local/lib/python3.12/site-packages/pyiceberg/table/__init__.py:1229](http://localhost:8080/usr/local/lib/python3.12/site-packages/pyiceberg/table/__init__.py#line=1228), in Table.append(self, df, snapshot_properties)
1221 """
1222 Shorthand API for appending a PyArrow table to the table.
1223
(...) 1226 snapshot_properties: Custom properties to be added to the snapshot summary
1227 """
1228 with self.transaction() as tx:
-> 1229 tx.append(df=df, snapshot_properties=snapshot_properties)
File [/usr/local/lib/python3.12/site-packages/pyiceberg/table/__init__.py:473](http://localhost:8080/usr/local/lib/python3.12/site-packages/pyiceberg/table/__init__.py#line=472), in Transaction.append(self, df, snapshot_properties)
470 with self._append_snapshot_producer(snapshot_properties) as append_files:
471 # skip writing data files if the dataframe is empty
472 if df.shape[0] > 0:
--> 473 data_files = list(
474 _dataframe_to_data_files(
475 table_metadata=self.table_metadata, write_uuid=append_files.commit_uuid, df=df, io=self._table.io
476 )
477 )
478 for data_file in data_files:
479 append_files.append_data_file(data_file)
File [/usr/local/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py:2601](http://localhost:8080/usr/local/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py#line=2600), in _dataframe_to_data_files(table_metadata, df, io, write_uuid, counter)
2590 yield from write_file(
2591 io=io,
2592 table_metadata=table_metadata,
(...) 2598 ),
2599 )
2600 else:
-> 2601 partitions = _determine_partitions(spec=table_metadata.spec(), schema=table_metadata.schema(), arrow_table=df)
2602 yield from write_file(
2603 io=io,
2604 table_metadata=table_metadata,
(...) 2617 ),
2618 )
File [/usr/local/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py:2648](http://localhost:8080/usr/local/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py#line=2647), in _determine_partitions(spec, schema, arrow_table)
2645 for partition, name in zip(spec.fields, partition_fields):
2646 source_field = schema.find_field(partition.source_id)
2647 arrow_table = arrow_table.append_column(
-> 2648 name, partition.transform.pyarrow_transform(source_field.field_type)(arrow_table[source_field.name])
2649 )
2651 unique_partition_fields = arrow_table.select(partition_fields).group_by(partition_fields).aggregate([])
2653 table_partitions = []
File [/usr/local/lib/python3.12/site-packages/pyiceberg/transforms.py:360](http://localhost:8080/usr/local/lib/python3.12/site-packages/pyiceberg/transforms.py#line=359), in BucketTransform.pyarrow_transform(self, source)
359 def pyarrow_transform(self, source: IcebergType) -> "Callable[[pa.Array], pa.Array]":
--> 360 from pyiceberg_core import transform as pyiceberg_core_transform
362 return self._pyiceberg_transform_wrapper(pyiceberg_core_transform.bucket, self._num_buckets)
ModuleNotFoundError: No module named 'pyiceberg_core'
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time
Metadata
Metadata
Assignees
Labels
No labels