-
-
Notifications
You must be signed in to change notification settings - Fork 132
Tolerate unescaped newlines in string literals #140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This should be good to merge! |
grammar.js
Outdated
$.escape_sequence | ||
)), | ||
"'" | ||
) | ||
), | ||
|
||
string: $ => $._string, | ||
jsx_string: $ => $._string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's no need to have two separate string
and jsx_string
rules.
Previously, I had been thinking of a different separation - in which, to be conservative, we would create a separate jsx_string
rule that allowed unescaped newlines, and would only be used inside of JSX expressions. But even if we did that, I would have used alias
to make sure that the jsx_string
appeared as a normal string
in the output. I don't think it's helpful to have a public-facing jsx_string
rule, since it's just another node type that users need to remember to target for things like syntax highlighting.
But I think it's probably fine to just allow unescaped newlines in all strings, as you've done. I would just get rid of the jsx_string
thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I will do this. (sorry about rushing this PR and then not reacting - it turns out this is a "hackathon week" at r2c and we're all doing different things this week)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed the fix. There's now just one string
rule that tolerates unescaped newlines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping :)
used everywhere, which tolerates unescaped newlines.
Hi @maxbrunsfeld, we'd like to merge this. We've been using it in prod for a while and it works fine. I just updated the generated code for mergeability. |
grammar.js
Outdated
string: $ => choice( | ||
seq( | ||
'"', | ||
repeat(choice( | ||
token.immediate(prec(PREC.STRING, /[^"\\\n]+|\\\r?\n/)), | ||
token.immediate(prec(PREC.STRING, /[^"\\\n]+|\\?\r?\n/)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if we're allowing newlines, you could simplify the regex so that newlines are not explicitly excluded on the left-hand-side of the OR operator. Then, the right-hand-side would only be for handling escaped newlines.
Something like [^"\\]|\\\r?\n
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Also, I'm now noticing that the regexp doesn't accept escaped quotes as in "\""
or '\''
. Looking into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, $.escape_sequence
takes care of escaped quotes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simplified the regexp and added tests to cover my expectations. I also moved my original test for jsx attributes to literals.txt
which seems more appropriate. expressions.txt
still has some basic tests for strings and numbers which are probably fine to stay there.
Thanks! |
Thank you Max! |
JSX text and attributes support HTML character references (a.k.a. entities), and don't support ECMAScript string escape sequences. Although the [spec] calls it "historical" and threatens to change it, it _is_ in the spec, and the spec is pretty stable at this point. In changing this, I landed back on an idea that @maxbrunsfeld suggested in a [PR review] some time ago: having separate `string` and `jsx_string` nodes, and aliasing `jsx_string` to `string` for consumers' convenience. At that time, having two different node types was deemed unnecessary, but this adds a second, more substantive difference between the two, so I've brought the idea back, and stopped allowing invalid newlines in JS string literals, which is invalid in both JS and TS. [spec]: https://facebook.github.io/jsx/#sec-jsx-string-characters [PR review]: tree-sitter#140 (comment)
JSX text and attributes support HTML character references (a.k.a. entities), and don't support ECMAScript string escape sequences. Although the [spec] calls it "historical" and threatens to change it, it _is_ in the spec, and the spec is pretty stable at this point. In changing this, I landed back on an idea that @maxbrunsfeld suggested in a [PR review] some time ago: having separate `string` and `jsx_string` nodes, and aliasing `jsx_string` to `string` for consumers' convenience. At that time, having two different node types was deemed unnecessary, but this adds a second, more substantive difference between the two, so I've brought the idea back, and stopped allowing invalid newlines in JS string literals, which is invalid in both JS and TS. [spec]: https://facebook.github.io/jsx/#sec-jsx-string-characters [PR review]: #140 (comment)
This is useful for typescript and is considered not harmful in javascript given the scope of the tree-sitter project.
This PR is a follow-up to tree-sitter/tree-sitter-typescript#91