Skip to content

feat: add schema conversion from avro timestamp-millis #2173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

matthias-Q
Copy link

Rationale for this change

The schema coonversion util from avro schema to iceberg schema did ignore timestamp-millis.

Are these changes tested?

Added tests for timestamp-millis and timestamp-micros as the latter was missing

Are there any user-facing changes?

no

@matthias-Q
Copy link
Author

Some notes:

  1. UUID is also not 100% in line with the Avro Schema specification. But I saw, that there are other PRs pending that fixes UUID related issues. (Fix UUID support #2007)

  2. Avro schema does not have a notion of field-id and element-id. I could add a helper function that would add these. I know this is not the core responsibility of this library. I was using this to create iceberg tables from Kafka topics, where the schema is stored in the schema registry. I think this is a viable use case and hence these helpers would add value.

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinjqliu
Copy link
Contributor

Avro schema does not have a notion of field-id and element-id. I could add a helper function that would add these. I know this is not the core responsibility of this library. I was using this to create iceberg tables from Kafka topics, where the schema is stored in the schema registry. I think this is a viable use case and hence these helpers would add value.

@matthias-Q im curious about the specific usecase. I think the field-id and element-id are already part of the avro schema.

According to the iceberg spec, https://iceberg.apache.org/spec/#avro under Field IDs

Iceberg struct, list, and map types identify nested types by ID. When writing data to Avro files, these IDs must be stored in the Avro schema to support ID-based column pruning.

also see

"type": ["null", {"element-id": 133, "type": "array", "items": "long"}],

@matthias-Q
Copy link
Author

@kevinjqliu yes, they are part of the Iceberg schema spec, but not for Avro (see https://avro.apache.org/docs/1.12.0/specification/). They are optional, but the conversion function requires them.

The specific use case is, that I am getting an Avro schema from a Kafka schema registry and I want to use that to create/evolve iceberg tables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants