Skip to content

Support HTML entities in JSX text/attributes #284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

cpmsmith
Copy link
Contributor

@cpmsmith cpmsmith commented Jan 4, 2024

JSX text and attributes support HTML character references (a.k.a. entities), and don't support ECMAScript string escape sequences.

Although the spec calls it "historical" and threatens to change it, it is in the spec, and the spec is pretty stable at this point.

In changing this, I landed back on an idea that @maxbrunsfeld suggested in a PR review some time ago: having separate string and jsx_string nodes, and aliasing jsx_string to string for consumers' convenience. At that time, having two different node types was deemed unnecessary, but this adds a second, more substantive difference between the two, so I've brought the idea back, and stopped allowing invalid newlines in JS string literals, which is invalid in both JS and TS.

TL;DR

Here is some JSX highlighted in Neovim using tree-sitter:
image

And here it is in VSCode, not using tree-sitter:
image

VSCode, correctly, does not highlight the \n in the JSX attribute, and does highlight the two valid  s, which tree-sitter-javascript doesn't currently parse. This PR fixes both things.

Checklist:

  • All tests pass in CI. (awaiting approval)
  • The script/parse-examples passes without issues.
  • There are sufficient tests for the new fix/feature.
  • Grammar rules have not been renamed unless absolutely necessary.
  • The conflicts section hasn't grown too much.
  • The parser size hasn't grown too much (check the value of STATE_COUNT in src/parser.c).

JSX text and attributes support HTML character references (a.k.a.
entities), and don't support ECMAScript string escape sequences.

Although the [spec] calls it "historical" and threatens to change it,
it _is_ in the spec, and the spec is pretty stable at this point.

In changing this, I landed back on an idea that @maxbrunsfeld suggested
in a [PR review] some time ago: having separate `string` and
`jsx_string` nodes, and aliasing `jsx_string` to `string` for consumers'
convenience. At that time, having two different node types was deemed
unnecessary, but this adds a second, more substantive difference between
the two, so I've brought the idea back, and stopped allowing invalid
newlines in JS string literals, which is invalid in both JS and TS.

[spec]: https://facebook.github.io/jsx/#sec-jsx-string-characters
[PR review]: tree-sitter#140 (comment)
@amaanq
Copy link
Member

amaanq commented Feb 1, 2024

I'm going to build on top of this with a fix for jsx vs js strings as well, thank you for the PR!

@amaanq amaanq mentioned this pull request Feb 1, 2024
@amaanq
Copy link
Member

amaanq commented Feb 1, 2024

Hey again @cpmsmith, I cherry picked your changes onto #291, thank you for the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants