Skip to content

Conversation

@EvilBeaver
Copy link
Contributor

I certify that I own, and have sufficient rights to contribute, all source code and related material intended to be compiled or integrated with the source code for the SharpZipLib open source product (the "Contribution"). My Contribution is licensed under the MIT License.

Issues #274 and #278 are fixed, but there's an another issue with Unicode test samples.

It seems, that Greek and Arabic samples are incorrect. I've added a workaround in EvilBeaver@887e10d

Probably, old-code tests weren't failing, because ZipStrings.SystemDefaultCodePage was set to 65001, which is seems to be returned by default from Encoding.GetEncoding(0) on net.core

@piksel
Copy link
Member

piksel commented Oct 25, 2018

This solution is more or less what I had intended too. The only problem with this approach is that you cannot force UTF-8 encoding by setting the Charset.

The SystemDefaultCodePage (GetEncoding(0)) differs on different frameworks and environments. I will take a look at the tests and see what is going on this weekend.

@EvilBeaver
Copy link
Contributor Author

The only problem with this approach is that you cannot force UTF-8 encoding by setting the Charset.

No, forcing UTF-8 is still possible. If UTF-8 flag is set in archive - then UTF-8 will be used regardless of codepage property. Please, can you clear the problem you've mentioned? What utf-8 forcing do you want?

@piksel
Copy link
Member

piksel commented Oct 27, 2018

With this solution you would not be able to unzip an archive that uses UTF-8 as it's encoding, whilst the unicode bit is not set. It would work if your systems default encoding is UTF-8, but not otherwise.
If we're adding support for setting the decompression encoding it would probably be better if we supported this scenario too.

Perhaps having Codepage default to -1 which would indicate "Auto" (SystemDefaultCodePage or UTF8 depending on Unicode bit), but if set to anything else it would use that codepage instead?

@EvilBeaver
Copy link
Contributor Author

With this solution you would not be able to unzip an archive that uses UTF-8 as it's encoding, whilst the unicode bit is not set

Why? If client sets CodePage to 65001 then UTF-8 will be used and archive will be read correctly. Am I wrong?

@piksel
Copy link
Member

piksel commented Oct 29, 2018

If CodePage is set to 65001 then

CodePage == Encoding.UTF8.CodePage? SystemDefaultCodePage : CodePage

will result in SystemDefaultCodePage being used, no? And that is not UTF-8 in some environments.

@EvilBeaver
Copy link
Contributor Author

Yes, this is wrong. I'll fix it to -1 as you proposed before

@EvilBeaver
Copy link
Contributor Author

@piksel please review this.

@EvilBeaver
Copy link
Contributor Author

@piksel thanks a lot! When the new NuGet version is planned? I want to use this fix in my project as soon as possible.

@piksel
Copy link
Member

piksel commented Nov 14, 2018

I'll try to get it done this weekend.

@piksel piksel changed the title Fix 274 and 278 Allow overriding encoding for zip ectraction Nov 23, 2018
@piksel piksel mentioned this pull request Jan 27, 2020
@EvilBeaver EvilBeaver deleted the fix-274-and-278 branch October 10, 2021 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants