Unit test Distribution.Utils.ShortText BinaryId fails #4644

fgaz · 2017-07-29T20:37:23Z

https://travis-ci.org/haskell-pushbot/cabal-binaries/builds/258926423

$ Cabal/unit-tests
[snip]
  Distribution.Utils.ShortText
    ShortText Id:                                       OK
      +++ OK, passed 100 tests.
    ShortText Ord:                                      OK
      +++ OK, passed 100 tests.
    ShortText Monoid:                                   OK
      +++ OK, passed 100 tests.
    ShortText BinaryId:                                 FAIL
      *** Failed! Falsifiable (after 40 tests and 5 shrinks): 
      "\65534"
      Use --quickcheck-replay=551066 to reproduce.

ping @ezyang

The text was updated successfully, but these errors were encountered:

ezyang · 2017-07-29T20:43:21Z

CC @hvr who introduced this test in 993d20a

fgaz · 2017-07-29T20:52:26Z

Wikipedia says \65534 is U+FFFE <noncharacter-FFFE> not a character.

FFFE and FFFF are not unassigned in the usual sense, but guaranteed not to be a Unicode character at all. They can be used to guess a text's encoding scheme, since any text containing these is by definition not a correctly encoded Unicode text. Unicode's U+FEFF Byte order mark character can be inserted at the beginning of a Unicode text to signal its endianness: a program reading such a text and encountering 0xFFFE would then know that it should switch the byte order for all the following characters.

Whoa.

hvr · 2017-07-30T11:56:59Z

I need to look into why a BOM (which btw makes no sense whatsoever for UTF8 encodings) doesn't round-trip properly. Iirc I specifically tested such corner-cases in the implementation of http://hackage.haskell.org/package/text-short

PS: I just noticed this is with the GHC 7.6.3 configuration, so this may be a problem with the legacy fallback...

hvr · 2017-07-30T12:05:54Z

After some investigation, the issue is in fact for the String-backed legacy fallback, whose Binary instance relies on the roundtrip property of Distribution.Utils.String.{encode,decode}StringUtf8, which fails for the BOM:

> decodeStringUtf8 ( encodeStringUtf8 "\65534")
"\65533"

because decodeStringUtf8 (imho rightfully) considers a BOM invalid in an UTF8 stream, and maps it to the replacement-character.

hvr · 2017-12-03T19:13:20Z

I'll take a stab at harmonizing the decodeStringUtf8 semantics with the more round-tripping friendly ones from text and text-short.

This changes `decodeStringUtf8` to not replace U+FFFE and U+FFFF into U+FFFD, while `encodeStringUtf8` now replaces surrogate pairs (i.e. code-points U+D800 through U+DFFF which are invalid in UTF-8) with U+FFFD. Consequently, `decodeStringUtf8 . encodeStringUtf8` can now properly round-trip all scalar code-points (i.e. [U+0000..U+D7FF] ∪ [U+E000..U+10FFFF]). This should finally address haskell#4644

hvr · 2017-12-04T08:47:59Z

I'm confident this one's been fixed via #4928; I ran unit-tests -p BinaryId --quickcheck-tests 999999 compiled for GHC 7.6.3 a few times; and also tried the replay value; everything passed so far.

fgaz mentioned this issue Jul 29, 2017

Use package root as data-file base path, not cwd #4641

Merged

3 tasks

23Skidoo assigned hvr Oct 15, 2017

23Skidoo added Cabal: tests/unit-tests type: bug Cabal: other labels Oct 15, 2017

fgaz mentioned this issue Nov 22, 2017

Target package names #4889

Merged

4 tasks

hvr mentioned this issue Dec 3, 2017

UTF-8 encoding/decoding is broken #4927

Closed

ttuegel mentioned this issue Dec 3, 2017

Disable per-component builds when coverage is enabled #4902

Closed

4 tasks

hvr mentioned this issue Dec 3, 2017

Modify replacement properties of encodeStringUtf8/decodeStringUtf8 #4928

Merged

3 tasks

hvr closed this as completed Dec 4, 2017

hvr added this to the 2.2 milestone Dec 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unit test Distribution.Utils.ShortText BinaryId fails #4644

Unit test Distribution.Utils.ShortText BinaryId fails #4644

fgaz commented Jul 29, 2017

ezyang commented Jul 29, 2017

fgaz commented Jul 29, 2017

hvr commented Jul 30, 2017 •

edited

Loading

hvr commented Jul 30, 2017

hvr commented Dec 3, 2017

hvr commented Dec 4, 2017

Unit test Distribution.Utils.ShortText BinaryId fails #4644

Unit test Distribution.Utils.ShortText BinaryId fails #4644

Comments

fgaz commented Jul 29, 2017

ezyang commented Jul 29, 2017

fgaz commented Jul 29, 2017

hvr commented Jul 30, 2017 • edited Loading

hvr commented Jul 30, 2017

hvr commented Dec 3, 2017

hvr commented Dec 4, 2017

hvr commented Jul 30, 2017 •

edited

Loading