-
Notifications
You must be signed in to change notification settings - Fork 18k
archive/zip: inconsistent non-ascii filename decoding #67878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
can you provide an example and code for a reproducer? |
|
Similar Issues
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.) |
@ZeinabAshjaei Here is a Go program that creates Unicode files in the root and subdirectories and it seems to work fine: https://go.dev/play/p/T6tNxT1HH8M?v=gotip. What program are you using to create the zip file? My guess is that program is writing bad zip file entries, or at least entries that are incompatible with Go's zip package. If you can attach a small example of a zip file that Go does not handle correctly, that would be helpful. Thanks. |
@rsc Thanks for the investigation, I agree, It seems only the zip file I tested is not producing the correct file names. The attached zip file includes 3 png files, generated by AI. |
Thanks. In the zip file you provided I see the same results using Go's archive/zip package and using |
This is working as intended. The archive/zip reader never attempts to translate the names found in the zip file to valid UTF-8. It simply presents the bytes in the zip file, which in the test file are "DALL\x{fa}E" as Ian said.
The zip reader does set f.NonUTF8 for these names as a signal to client code that they might need to be careful. |
Go version
go version go1.21.1 linux/amd64
Output of
go env
in your module/workspace:What did you do?
When iterating over files within a zip archive using the Go standard library's zip package, there is an inconsistency in filename encoding. Specifically, when a file is located at the root level of the zip archive, the filename is retrieved with invalid encoding, displaying characters such as question marks instead of the original characters. However, the filename is correctly encoded, when the same file is within a folder structure in the zip archive.
What did you see happen?
Steps to Reproduce:
Create a zip archive containing files with filenames that include non-ASCII characters, such as "·".
Iterate over the files in the zip archive using the zip package in Go.
Observe the filenames retrieved when files are located at the root level versus within a folder structure.
Actual Behavior:
Filenames retrieved from files at the root level of the zip archive exhibit incorrect encoding, displaying invalid characters such as question marks. Filenames within folders in the zip archive are correctly encoded.
What did you expect to see?
Filenames retrieved during iteration should maintain consistent encoding regardless of their location within the zip archive. The original characters in the filenames, including non-ASCII characters, should be preserved.``
The text was updated successfully, but these errors were encountered: