Skip to content

Improve compression of pickled quotes #3877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nicolasstucki opened this issue Jan 20, 2018 · 1 comment
Closed

Improve compression of pickled quotes #3877

nicolasstucki opened this issue Jan 20, 2018 · 1 comment

Comments

@nicolasstucki
Copy link
Contributor

We need to improve the encoding performed in TastyString. With the encoding found in the discussion.

@lrytz 5 days ago • Owner
It took me a while to find it.. Need to clean this up / document. Method parseScalaSigBytes calls ConstantPool.getBytes which goes through ByteCodecs.decode.

The encoding is explained here http://www.scala-lang.org/old/sites/default/files/sids/dubochet/Mon,%202010-05-31,%2015:25/Storage%20of%20pickled%20Scala%20signatures%20in%20class%20files.pdf

first map all 8-bit bytes to 7 bits (shifting the rest)
then increment all by 1 (in 7 bits), so 0x7f becomes 0x00
then encode 0x00 as 0xc0 0x80, which is an overlong utf 8 encoding for zero. it's what the jvm classfile spec uses to avoid having 0x00 in strings. it's called "modified utf 8".
the reason for the incrementing by 1 that 0x7f is expected to be less common than 0x00, so the two byte encoding hits less often.

The confusing part is that the class ScalaSigBytes used in the backend to encode the signature uses ByteCodecs.encode8to7, but does the +1 itself. It doesn't need to map 0x00 to the two byte version because ASM will do it when writing the annotation to the classfile. However, in the unpickler, we don't use ASM to read the annotation, but just get the bytes from the classfile directly. So there we'll see the two byte encoding. ByteCodecs.decode does the necessary work.

@lrytz
Copy link
Member

lrytz commented Jan 22, 2018

See also scala/scala#6263

@nicolasstucki nicolasstucki changed the title Improve compression to pickled quotes Improve compression of pickled quotes Oct 5, 2018
nicolasstucki added a commit to dotty-staging/dotty that referenced this issue Oct 22, 2018
Improves scala#3877

`'{ { val y = ~x * ~x; ~powerCode(n / 2, '(y)) } }`
was reduced from 206 bytes to 172 bytes

`'{ ~x * ~powerCode(n - 1, x) }`
was reduced from 143 bytes to 128 bytes
nicolasstucki added a commit to dotty-staging/dotty that referenced this issue Oct 22, 2018
Improves scala#3877

`'{ { val y = ~x * ~x; ~powerCode(n / 2, '(y)) } }`
was reduced from 206 bytes to 172 bytes

`'{ ~x * ~powerCode(n - 1, x) }`
was reduced from 143 bytes to 128 bytes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants