Skip to content

Conversation

zhongyujiang
Copy link
Contributor

Rationale for this change

Some fsspec write implementations expect the input data to be bytes, such as in JuiceFS: https://github.com/juicedata/juicefs/blob/main/sdk/python/juicefs/juicefs/juicefs.py#L514.
But currently, some places in BinaryEncoder use bytearray, which can cause exceptions when writing the manifest.

raise TypeError(f"a bytes-like object is required, not '{type(data).__name__}'")
TypeError: a bytes-like object is required, not 'bytearray'

I ran into this issue while adding Juice filesystem support, so this PR fixes the data type written by BinaryEncoder to bytes in order to resolve it.

Currently, pyiceberg.io.OutputStream.write is also annotated to accept data of type bytes.

Are these changes tested?

It seems that the filesystems currently supported in pyiceberg do not strictly require the write input to be of type bytes, so I didn’t add a test. I tested this on our internal Juice filesystem, and it works.

Are there any user-facing changes?

No

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zhongyujiang for cleaning this up 👍

@Fokko Fokko merged commit f5e3e59 into apache:main Sep 17, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants