Skip to content

AST flag/API option; generic AST output for all nodes #5044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 12, 2018

Conversation

GeoffreyBooth
Copy link
Collaborator

@GeoffreyBooth GeoffreyBooth commented Apr 29, 2018

Building off of #4984. The major prerequisite to support for CoffeeScript in Prettier, or better ecosystem tooling in general (a new JS2Coffee, etc.) is being able to generate a abstract syntax tree, or AST. This is a JSON output (or in Node, a plain JavaScript object output) representation of all the nodes that the compiler would be generating JavaScript source code for. Once we have an AST, it can be passed into Prettier for Prettier to generate pretty-formatted CoffeeScript; into CoffeeLint for linting; and so on. The intent of this PR is to focus only on the “generate an AST” part, leaving further integrations for future PRs.

This PR aims to update the CoffeeScript compiler to generate such an AST. The current CLI already has a --nodes flag that prints debugging info about all the nodes; I’m updating that to produce the output as JSON instead. The Node CLI will take a nodes option to return the same thing.

This PR will be WIP until all the nodes have complete details about them, such that Prettier has enough to work with to generate new CoffeeScript source. I’m treating that as the initial benchmark for completeness; if other tools require yet more detail, the additional details can be added in future PRs.

To try out generating an AST, check out this branch and run:

./bin/coffee --nodes --eval 'answer = 42'

For syntax-highlighted output, or a longer input, create a test.coffee at the root of the repo with whatever CoffeeScript code you want to see an AST of, then run:

coffee --eval "console.log require('util').inspect \
require('./lib/coffeescript').compile( \
require('fs').readFileSync('./test.coffee').toString(), nodes: yes), \
{colors: yes, depth: 10}"

You should see pretty-printed JSON like this:

{ type: 'Block',
  loc: { start: { line: 0, column: 0 }, end: { line: 0, column: 10 } },
  expressions:
   [ { type: 'Assign',
       loc: { start: { line: 0, column: 0 }, end: { line: 0, column: 10 } },
       variable:
        { type: 'IdentifierLiteral',
          loc: { start: { line: 0, column: 0 }, end: { line: 0, column: 5 } },
          value: 'answer' },
       value:
        { type: 'NumberLiteral',
          loc: { start: { line: 0, column: 9 }, end: { line: 0, column: 10 } },
          value: '42' } } ] }

This is the same output as the current CLI’s --nodes, in JSON form, with:

  • the node class names as type, which is how the Babel AST has them
  • the location data structured the way Babel does it
  • any primitive (boolean, number, string) properties included, as these are serializable as is into JSON
  • children included recursively

For nodes as simple as NumberLiteral, this is all the detail we need. But for complicated nodes like Class, a lot more work will be required. Basically, every node class that has a compileNode method will need a similar compileObject method; the former generates a JavaScript output string, and the latter will generate AST objects. Another issue that needs tackling, perhaps in a separate PR, is preserving data from the lexer that doesn’t make it into the node classes but that we need to produce a complete AST object. I’m hoping we can use this branch to track this work, and I’m happy to invite anyone to my repo who would like to contribute.

@jashkenas
Copy link
Owner

It's nice to see this being developed in a side PR — I'm not terribly optimistic about how easy it's going to be to get prettier to format CoffeeScript.

But looks like a fine start! My only comment is that the toPlainObject method is usually called toJSON():

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify#toJSON()_behavior

@vendethiel
Copy link
Collaborator

Good luck, have fun; as they say!

Happy to see you try and tackle this one

@GeoffreyBooth
Copy link
Collaborator Author

Um, I’m hoping to not tackle it alone! Help most welcome!

Prettier can handle oddball formats like CSS and Markdown, so I would think it could handle CoffeeScript. But regardless that’s a whole separate effort. I just want to get a good AST out first, so that others can tackle integrations. JS2Coffee already has code that converts an AST into CoffeeScript source, so clearly it’s possible to do.

@GeoffreyBooth
Copy link
Collaborator Author

I think there’s a fairly easy way to track our progress toward a “complete” AST. So the Babel AST spec lists every one of Babel’s AST token types and properties. For example:

interface RegExpLiteral <: Literal {
  type: "RegExpLiteral";
  pattern: string;
  flags: string;
}

Almost all of our node types have corresponding types in Babel’s AST spec. We could compare the AST output for our node against its counterpart in the Babel AST, and if both contain the same properties, that node type is complete. So for RegExpLiteral, if our node type has both the pattern and flags properties, and the values of those properties are roughly equivalent in terms of the information conveyed (if not necessarily the format or syntax) then our equivalent to Babel’s RegExpLiteral is done. Since Babel takes its AST as input to generate source code, by definition its AST is “complete” in that it contains all the information needed for source code generation. If our AST contains the same information, it should be just as complete.

Comparing node types also has the advantage that we don’t need a code generation method (like Prettier) built yet. All we need is a “kitchen sink” input file that has at least one of every node type, so it generates and AST that we can compare against Babel’s. @zdenko @helixbass

@vjeux
Copy link

vjeux commented May 8, 2018

Some context about prettier, the way it works is the following: from an AST you generate an intermediate representation ( https://github.com/prettier/prettier/blob/master/commands.md ) that describes where things should break. Since the CoffeeScript AST is different than the JavaScript AST, you're likely going to have to rebuilt all that part from scratch.

If you use prettier, you're going to get a lot of infrastructure for free:

  • A CLI that supports a bunch of options
  • Unit testing infrastructure that has snapshot tests and indempotency checks
  • Editor integrations for all the editors / IDE on earth
  • A bunch of utils to handle comments, sub-languages (ex css inside of backticks), ...
  • An online playground to play with it (Not yet done for plugins but will be soon)

I didn't realize at the time but there's actually a bunch of work outside of just the core formatter that is needed to make one successful. So, we've been trying to make a plugin system where if you want to add a new language, you just have to build the printer for that language and you get the rest for free.

@GeoffreyBooth
Copy link
Collaborator Author

Thanks @vjeux. One of our maintainers is @lydell, so presumably we’ll have some help when it comes to the Prettier side of things 😄

One thing I’m concerned by is the idea of line breaking CoffeeScript source code in general. I know that intelligent line breaks was the original purpose of Prettier, but I think it’s fair to say that Prettier has evolved from that to now function as a best practices code reformatter in general. Since CoffeeScript is whitespace dependent, unlike JavaScript, it’s common to assume a soft word wrap and never break lines of code for the purpose of keeping under a column limit. Certainly people write objects and arrays in multiline style instead of inline when the inline version starts to get long and unwieldy, for example, but using the line continuation symbol \ or implying a continuation via things like ending a line in and are fairly uncommon. I don’t want to say there’s consensus on this view, but it’s fair to say that there isn’t consensus that CoffeeScript code should be column-limited. This is something we’ll have to address, perhaps through a configuration option, when we get closer to outputting code via Prettier.

@vjeux
Copy link

vjeux commented May 8, 2018

Prettier itself doesn't enforce any of those, this is the person writing the ast -> IR that has to make those decisions. The IR allows you to express what happens if something goes over 80 columns.

It may seem that enforcing based on the 80-column rule is not ideal but in practice, people tend to write code that way. For example if you look at the files in coffeescript/src/, they tend to fit 80 columns.

Now, this is only one input around when to break. For example in prettier we always break for if when there are braces. You also don't have to follow the column and read the input source and respect where break were, we do that for objects.

The strength of prettier infrastructure is that it gives you (IMO) the right primitives on-top of which you can build a solid pretty printer tailored for your language and the way you want code to look like.

@helixbass
Copy link
Collaborator

See comment in #4984 re progress, am able to nicely reformat tests/*.coffee (ie our whole test suite) using Prettier and the tests still pass

@GeoffreyBooth GeoffreyBooth changed the base branch from master to ast June 11, 2018 04:26
@GeoffreyBooth
Copy link
Collaborator Author

@helixbass Is this safe to merge into the ast branch? It really only covers the CLI --ast flag (and Node API equivalent) and generic AST node generation. Would anything in here conflict with your other/future PRs? I took a look at your prettier branch but there’s so much else going on there, with running Prettier itself as part of generating the output, that it didn’t seem fully equivalent.

I was thinking that if this were in, then we at least have a simple way of generating AST output that we can refer to as we merge more and more PRs in.

@GeoffreyBooth GeoffreyBooth changed the title WIP: Generate AST AST flag/API option; generic AST output for all nodes Jun 11, 2018
@helixbass
Copy link
Collaborator

@GeoffreyBooth sure, there are things that would conflict with the prettier branch in its current state, but like you said there'll have to be modifications to that branch anyway in preparation for merging whatever initial version of it around AST generation

@GeoffreyBooth GeoffreyBooth merged commit 8a25195 into jashkenas:ast Jun 12, 2018
@GeoffreyBooth GeoffreyBooth deleted the nodes branch June 12, 2018 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants