Allow overriding encoding for zip ectraction #280

EvilBeaver · 2018-10-25T09:58:24Z

I certify that I own, and have sufficient rights to contribute, all source code and related material intended to be compiled or integrated with the source code for the SharpZipLib open source product (the "Contribution"). My Contribution is licensed under the MIT License.

Issues #274 and #278 are fixed, but there's an another issue with Unicode test samples.

It seems, that Greek and Arabic samples are incorrect. I've added a workaround in EvilBeaver@887e10d

Probably, old-code tests weren't failing, because ZipStrings.SystemDefaultCodePage was set to 65001, which is seems to be returned by default from Encoding.GetEncoding(0) on net.core

Some samples become broken when they're writen to byte array with their encoder

piksel · 2018-10-25T22:14:13Z

This solution is more or less what I had intended too. The only problem with this approach is that you cannot force UTF-8 encoding by setting the Charset.

The SystemDefaultCodePage (GetEncoding(0)) differs on different frameworks and environments. I will take a look at the tests and see what is going on this weekend.

EvilBeaver · 2018-10-26T12:28:35Z

The only problem with this approach is that you cannot force UTF-8 encoding by setting the Charset.

No, forcing UTF-8 is still possible. If UTF-8 flag is set in archive - then UTF-8 will be used regardless of codepage property. Please, can you clear the problem you've mentioned? What utf-8 forcing do you want?

piksel · 2018-10-27T10:15:28Z

With this solution you would not be able to unzip an archive that uses UTF-8 as it's encoding, whilst the unicode bit is not set. It would work if your systems default encoding is UTF-8, but not otherwise.
If we're adding support for setting the decompression encoding it would probably be better if we supported this scenario too.

Perhaps having Codepage default to -1 which would indicate "Auto" (SystemDefaultCodePage or UTF8 depending on Unicode bit), but if set to anything else it would use that codepage instead?

EvilBeaver · 2018-10-29T07:56:38Z

With this solution you would not be able to unzip an archive that uses UTF-8 as it's encoding, whilst the unicode bit is not set

Why? If client sets CodePage to 65001 then UTF-8 will be used and archive will be read correctly. Am I wrong?

piksel · 2018-10-29T11:44:51Z

If CodePage is set to 65001 then

CodePage == Encoding.UTF8.CodePage? SystemDefaultCodePage : CodePage

will result in SystemDefaultCodePage being used, no? And that is not UTF-8 in some environments.

EvilBeaver · 2018-10-29T15:38:15Z

Yes, this is wrong. I'll fix it to -1 as you proposed before

EvilBeaver · 2018-11-03T14:59:00Z

@piksel please review this.

EvilBeaver · 2018-11-13T07:46:37Z

@piksel thanks a lot! When the new NuGet version is planned? I want to use this fix in my project as soon as possible.

piksel · 2018-11-14T11:20:14Z

I'll try to get it done this weekend.

EvilBeaver added 3 commits October 25, 2018 12:15

fixed icsharpcode#274 and icsharpcode#278 encoding issue

cac1709

Added russian name sample

2df666f

Added workaround for irreversible text samples

887e10d

Some samples become broken when they're writen to byte array with their encoder

EvilBeaver added 2 commits November 3, 2018 17:34

Renamed local variable which had the same name, that static field has

782ccce

Magic codepage value for detecting auto-codepage. see icsharpcode#278

1253a23

piksel merged commit 5ff738b into icsharpcode:master Nov 12, 2018

piksel mentioned this pull request Nov 12, 2018

ZipFile: Bugfix: Use unicode encoding to read the name string if UseUnicode is set #284

Closed

piksel changed the title ~~Fix 274 and 278~~ Allow overriding encoding for zip ectraction Nov 23, 2018

piksel mentioned this pull request Jan 27, 2020

fix ZipStrings.UseUnicode #411

Closed

EvilBeaver deleted the fix-274-and-278 branch October 10, 2021 18:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow overriding encoding for zip ectraction #280

Allow overriding encoding for zip ectraction #280

Uh oh!

EvilBeaver commented Oct 25, 2018

Uh oh!

piksel commented Oct 25, 2018

Uh oh!

EvilBeaver commented Oct 26, 2018

Uh oh!

piksel commented Oct 27, 2018

Uh oh!

EvilBeaver commented Oct 29, 2018

Uh oh!

piksel commented Oct 29, 2018

Uh oh!

EvilBeaver commented Oct 29, 2018

Uh oh!

EvilBeaver commented Nov 3, 2018

Uh oh!

EvilBeaver commented Nov 13, 2018

Uh oh!

piksel commented Nov 14, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Allow overriding encoding for zip ectraction #280

Allow overriding encoding for zip ectraction #280

Uh oh!

Conversation

EvilBeaver commented Oct 25, 2018

Uh oh!

piksel commented Oct 25, 2018

Uh oh!

EvilBeaver commented Oct 26, 2018

Uh oh!

piksel commented Oct 27, 2018

Uh oh!

EvilBeaver commented Oct 29, 2018

Uh oh!

piksel commented Oct 29, 2018

Uh oh!

EvilBeaver commented Oct 29, 2018

Uh oh!

EvilBeaver commented Nov 3, 2018

Uh oh!

EvilBeaver commented Nov 13, 2018

Uh oh!

piksel commented Nov 14, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants