-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add TASTY pickling of quotes and implement ~
on quotes
#3662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TASTY pickling of quotes and implement ~
on quotes
#3662
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, and thank you for opening this PR! 🎉
All contributors have signed the CLA, thank you! ❤️
Commit Messages
We want to keep history, but for that to actually be useful we have
some rules on how to format our commit messages (relevant xkcd).
Please stick to these guidelines for commit messages:
- Separate subject from body with a blank line
- When fixing an issue, start your commit message with
Fix #<ISSUE-NBR>:
- Limit the subject line to 72 characters
- Capitalize the subject line
- Do not end the subject line with a period
- Use the imperative mood in the subject line ("Added" instead of "Add")
- Wrap the body at 80 characters
- Use the body to explain what and why vs. how
adapted from https://chris.beams.io/posts/git-commit
Have an awesome day! ☀️
4ad2209
to
a71e267
Compare
object TastyString { | ||
|
||
/** Decode the TASTY String into TASTY bytes */ | ||
def stringToTasty(str: String): Array[Byte] = { // TODO factor out this and tastyToString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str.getBytes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the string encoding messes up the bytes
} | ||
|
||
/** Encode TASTY bytes into a TASTY String */ | ||
def tastyToString(bytes: Array[Byte]): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new String(bytes)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a tricky problem. Looking at Stackoverflow, people say you should use a Codec for this, typically Base64. The scheme of mapping all bytes to ranges 0..255 looks like it would work, but it's not optimal. Strings are represented in Classfiles as UTF8 characters, with one byte for ranges 0.127 and two bytes for ranges 128-255. This means that, assuming a uniform bit distribution you get an overhead of 50%. Doing a 8->7 bit codec would give an overhead of less than 15%.
There's another problem of string size. Strings are limited to 65365 characters. This might not be enough for a larger quoted program.
scalac solves both of these problems when serializing its pickles as annotations. I think we should copy that scheme. I tried to find it but could not. @retronym @lrytz @adriaanm does one of you have an idea where the code that serializes a Pickle as an annotation is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can leave it like this for this PR, but then we should open an issue for future improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will start looking at the alternatives. I also think we should start with this for now to unblock the next PRs and allow people to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lrytz thanks for the link. Could you also point me to the place where the String
/Array[Strings]
are converted back into an Array[Byte]
. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a while to find it.. Need to clean this up / document. Method parseScalaSigBytes
calls ConstantPool.getBytes
which goes through ByteCodecs.decode
.
The encoding is explained here http://www.scala-lang.org/old/sites/default/files/sids/dubochet/Mon,%202010-05-31,%2015:25/Storage%20of%20pickled%20Scala%20signatures%20in%20class%20files.pdf
- first map all 8-bit bytes to 7 bits (shifting the rest)
- then increment all by 1 (in 7 bits), so 0x7f becomes 0x00
- then encode 0x00 as 0xc0 0x80, which is an overlong utf 8 encoding for zero. it's what the jvm classfile spec uses to avoid having 0x00 in strings. it's called "modified utf 8".
the reason for the incrementing by 1 that 0x7f is expected to be less common than 0x00, so the two byte encoding hits less often.
The confusing part is that the class ScalaSigBytes
used in the backend to encode the signature uses ByteCodecs.encode8to7
, but does the +1
itself. It doesn't need to map 0x00 to the two byte version because ASM will do it when writing the annotation to the classfile. However, in the unpickler, we don't use ASM to read the annotation, but just get the bytes from the classfile directly. So there we'll see the two byte encoding. ByteCodecs.decode
does the necessary work.
cbca568
to
166c49c
Compare
280f625
to
8cedf06
Compare
48bbffd
to
8939eb9
Compare
All requested changes have been made. |
Those tests have never fully been supported, all files are compiled at the same time.
* Pickle/unpickle quotes * Add Splicer * Add tree interpreter for splicing staged expression * Add concrete implementations of Quoted
6289d07
to
f86bcc5
Compare
Rebased to make sure we do not have regressions. |
Based on #3634