Skip to content

Add TASTY pickling of quotes and implement ~ on quotes #3662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jan 8, 2018

Conversation

nicolasstucki
Copy link
Contributor

Based on #3634

Copy link
Member

@dottybot dottybot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, and thank you for opening this PR! 🎉

All contributors have signed the CLA, thank you! ❤️

Commit Messages

We want to keep history, but for that to actually be useful we have
some rules on how to format our commit messages (relevant xkcd).

Please stick to these guidelines for commit messages:

  1. Separate subject from body with a blank line
  2. When fixing an issue, start your commit message with Fix #<ISSUE-NBR>:
  3. Limit the subject line to 72 characters
  4. Capitalize the subject line
  5. Do not end the subject line with a period
  6. Use the imperative mood in the subject line ("Added" instead of "Add")
  7. Wrap the body at 80 characters
  8. Use the body to explain what and why vs. how

adapted from https://chris.beams.io/posts/git-commit

Have an awesome day! ☀️

object TastyString {

/** Decode the TASTY String into TASTY bytes */
def stringToTasty(str: String): Array[Byte] = { // TODO factor out this and tastyToString
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str.getBytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the string encoding messes up the bytes

}

/** Encode TASTY bytes into a TASTY String */
def tastyToString(bytes: Array[Byte]): String = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new String(bytes)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tricky problem. Looking at Stackoverflow, people say you should use a Codec for this, typically Base64. The scheme of mapping all bytes to ranges 0..255 looks like it would work, but it's not optimal. Strings are represented in Classfiles as UTF8 characters, with one byte for ranges 0.127 and two bytes for ranges 128-255. This means that, assuming a uniform bit distribution you get an overhead of 50%. Doing a 8->7 bit codec would give an overhead of less than 15%.

There's another problem of string size. Strings are limited to 65365 characters. This might not be enough for a larger quoted program.

scalac solves both of these problems when serializing its pickles as annotations. I think we should copy that scheme. I tried to find it but could not. @retronym @lrytz @adriaanm does one of you have an idea where the code that serializes a Pickle as an annotation is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leave it like this for this PR, but then we should open an issue for future improvements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will start looking at the alternatives. I also think we should start with this for now to unblock the next PRs and allow people to use it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Copy link
Member

@lrytz lrytz Jan 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lrytz thanks for the link. Could you also point me to the place where the String/Array[Strings] are converted back into an Array[Byte]. Thanks.

Copy link
Member

@lrytz lrytz Jan 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to find it.. Need to clean this up / document. Method parseScalaSigBytes calls ConstantPool.getBytes which goes through ByteCodecs.decode.

The encoding is explained here http://www.scala-lang.org/old/sites/default/files/sids/dubochet/Mon,%202010-05-31,%2015:25/Storage%20of%20pickled%20Scala%20signatures%20in%20class%20files.pdf

  • first map all 8-bit bytes to 7 bits (shifting the rest)
  • then increment all by 1 (in 7 bits), so 0x7f becomes 0x00
  • then encode 0x00 as 0xc0 0x80, which is an overlong utf 8 encoding for zero. it's what the jvm classfile spec uses to avoid having 0x00 in strings. it's called "modified utf 8".

the reason for the incrementing by 1 that 0x7f is expected to be less common than 0x00, so the two byte encoding hits less often.

The confusing part is that the class ScalaSigBytes used in the backend to encode the signature uses ByteCodecs.encode8to7, but does the +1 itself. It doesn't need to map 0x00 to the two byte version because ASM will do it when writing the annotation to the classfile. However, in the unpickler, we don't use ASM to read the annotation, but just get the bytes from the classfile directly. So there we'll see the two byte encoding. ByteCodecs.decode does the necessary work.

@nicolasstucki nicolasstucki force-pushed the add-meta-with-tasty branch 24 times, most recently from cbca568 to 166c49c Compare December 20, 2017 10:18
@nicolasstucki nicolasstucki force-pushed the add-meta-with-tasty branch 2 times, most recently from 48bbffd to 8939eb9 Compare December 28, 2017 11:54
@nicolasstucki
Copy link
Contributor Author

All requested changes have been made.

@nicolasstucki nicolasstucki requested a review from odersky January 8, 2018 09:59
@nicolasstucki
Copy link
Contributor Author

Rebased to make sure we do not have regressions.

@nicolasstucki nicolasstucki merged commit 39e466d into scala:master Jan 8, 2018
@allanrenucci allanrenucci deleted the add-meta-with-tasty branch January 13, 2018 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants