Skip to content

[bug] Schema validation should reject field names that are invalid Avro identifiers #2123

Open
@nvartolomei

Description

@nvartolomei

Apache Iceberg version

None

Please describe the bug 🐞

Example schema:

schema = Schema(
    NestedField(id=1, name="😎", field_type=StringType(), required=False),
)

partition_spec = PartitionSpec(
    PartitionField(
        source_id=1,
        field_id=1001,
        transform=IdentityTransform(),
        name="😎",)
)

Write some data then try to read it with DuckDB or simply:

avrocat /home/nv/src/pyiceberg-example/warehouse/default.db/nested_table/metadata/afc5e55c-6dd2-4875-841c-410108fccf8e-m0.avro | jq .
Error opening /home/nv/src/pyiceberg-example/warehouse/default.db/nested_table/metadata/afc5e55c-6dd2-4875-841c-410108fccf8e-m0.avro:
  Cannot parse file header: Invalid Avro identifier

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions