Skip to content

Clean up strings in the compiler toolchain #5522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cristianoc opened this issue Jul 7, 2022 · 7 comments
Closed

Clean up strings in the compiler toolchain #5522

cristianoc opened this issue Jul 7, 2022 · 7 comments
Milestone

Comments

@cristianoc
Copy link
Collaborator

See #5521 and rescript-lang/syntax#602


Just writing down some notes here.

This reliance on indirect type checking by putting "" at either end of string concatenation seems brittle.
Also there seem to be several layers of sediment left presumably from old ways of doing things.

So `stuff` is the same as js`stuff` by convention I think.

Then there is j`stuff` which somehow obeys different rules. So in j`$x` the $x actually has a meaning. And x does not need to be a string. The consequence of this is that "" + x is not the same as x when x is not a string. So removing empty string concatenation in the back-end of the compiler is also delicate as it's easy to do it wrong (3 + "" can't be removed).

Then there's json`stuff` which I don't know maybe it's the same as j but old, not really sure. Are they really treated in the same way at every stage in the compiler? Not sure.

All this is represented internally by putting together strings that have a tag "j" or "js" or "json".
In addition to all that, strings produced by the parser are now by default unicode, and that uses the tag "*j". But, there's also the OCaml parser for .ml files which will never generate "*j" for normal strings.

In addition to all this, there's some half attempt to also use a type "unicode" inside the back-end of the compiler, which seems incomplete.

Also, there's a quoting mechanism that happens on dump (code generation) which depends on which kind of string it is.

Goes without saying, all this needs a good cleanup.

@cristianoc
Copy link
Collaborator Author

How to debug:

./darwinarm64/bsc.exe -dparsetree templ.res

Gives the parse tree coming from the parser (which produces js string) and the front-end https://github.com/rescript-lang/rescript-compiler/blob/master/jscomp/frontend/ast_utf8_string_interp.ml#L391

Then the back-end is executed and the code generated.

@cristianoc
Copy link
Collaborator Author

One issue is that the behaviour of string templates is not specified, so it's not clear what to expect.
What is `foo` and j`foo` and js`foo` and json`foo` ? Do we need all of them? How do they differ?

CC @bobzhang

@cristianoc
Copy link
Collaborator Author

#5641

@cristianoc
Copy link
Collaborator Author

cristianoc commented Sep 7, 2022

After #5642, the list of delimiters currently used is explicit:

and delim = | DNone | DJ | DJS | DStarJ | DJson

@cristianoc
Copy link
Collaborator Author

  • json is only used with @as e.g. Document "@as(jsonxxx)" rescript-lang.org#550
  • js is the same as omitting it, i.e. language-level string interpolation
  • j is old interpolation with different rules, e.g. $a instead of ${a} and a does not have to be string, there are probably still uses
  • *j is for internal use

@cristianoc
Copy link
Collaborator Author

#5645

@cristianoc
Copy link
Collaborator Author

Not clean, but cleaner now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant