AST flag/API option; generic AST output for all nodes #5044

GeoffreyBooth · 2018-04-29T03:43:36Z

Building off of #4984. The major prerequisite to support for CoffeeScript in Prettier, or better ecosystem tooling in general (a new JS2Coffee, etc.) is being able to generate a abstract syntax tree, or AST. This is a JSON output (or in Node, a plain JavaScript object output) representation of all the nodes that the compiler would be generating JavaScript source code for. Once we have an AST, it can be passed into Prettier for Prettier to generate pretty-formatted CoffeeScript; into CoffeeLint for linting; and so on. The intent of this PR is to focus only on the “generate an AST” part, leaving further integrations for future PRs.

This PR aims to update the CoffeeScript compiler to generate such an AST. The current CLI already has a --nodes flag that prints debugging info about all the nodes; I’m updating that to produce the output as JSON instead. The Node CLI will take a nodes option to return the same thing.

This PR will be WIP until all the nodes have complete details about them, such that Prettier has enough to work with to generate new CoffeeScript source. I’m treating that as the initial benchmark for completeness; if other tools require yet more detail, the additional details can be added in future PRs.

To try out generating an AST, check out this branch and run:

./bin/coffee --nodes --eval 'answer = 42'

For syntax-highlighted output, or a longer input, create a test.coffee at the root of the repo with whatever CoffeeScript code you want to see an AST of, then run:

coffee --eval "console.log require('util').inspect \
require('./lib/coffeescript').compile( \
require('fs').readFileSync('./test.coffee').toString(), nodes: yes), \
{colors: yes, depth: 10}"

You should see pretty-printed JSON like this:

{ type: 'Block',
  loc: { start: { line: 0, column: 0 }, end: { line: 0, column: 10 } },
  expressions:
   [ { type: 'Assign',
       loc: { start: { line: 0, column: 0 }, end: { line: 0, column: 10 } },
       variable:
        { type: 'IdentifierLiteral',
          loc: { start: { line: 0, column: 0 }, end: { line: 0, column: 5 } },
          value: 'answer' },
       value:
        { type: 'NumberLiteral',
          loc: { start: { line: 0, column: 9 }, end: { line: 0, column: 10 } },
          value: '42' } } ] }

This is the same output as the current CLI’s --nodes, in JSON form, with:

the node class names as type, which is how the Babel AST has them
the location data structured the way Babel does it
any primitive (boolean, number, string) properties included, as these are serializable as is into JSON
children included recursively

For nodes as simple as NumberLiteral, this is all the detail we need. But for complicated nodes like Class, a lot more work will be required. Basically, every node class that has a compileNode method will need a similar compileObject method; the former generates a JavaScript output string, and the latter will generate AST objects. Another issue that needs tackling, perhaps in a separate PR, is preserving data from the lexer that doesn’t make it into the node classes but that we need to produce a complete AST object. I’m hoping we can use this branch to track this work, and I’m happy to invite anyone to my repo who would like to contribute.

…ST; starting point for toPlainObject method on node classes to return the serializable plain JavaScript object representation of each node

jashkenas · 2018-04-30T20:32:59Z

It's nice to see this being developed in a side PR — I'm not terribly optimistic about how easy it's going to be to get prettier to format CoffeeScript.

But looks like a fine start! My only comment is that the toPlainObject method is usually called toJSON():

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify#toJSON()_behavior

vendethiel · 2018-04-30T21:39:12Z

Good luck, have fun; as they say!

Happy to see you try and tackle this one

GeoffreyBooth · 2018-04-30T22:05:12Z

Um, I’m hoping to not tackle it alone! Help most welcome!

Prettier can handle oddball formats like CSS and Markdown, so I would think it could handle CoffeeScript. But regardless that’s a whole separate effort. I just want to get a good AST out first, so that others can tackle integrations. JS2Coffee already has code that converts an AST into CoffeeScript source, so clearly it’s possible to do.

GeoffreyBooth · 2018-05-01T17:24:00Z

I think there’s a fairly easy way to track our progress toward a “complete” AST. So the Babel AST spec lists every one of Babel’s AST token types and properties. For example:

interface RegExpLiteral <: Literal {
  type: "RegExpLiteral";
  pattern: string;
  flags: string;
}

Almost all of our node types have corresponding types in Babel’s AST spec. We could compare the AST output for our node against its counterpart in the Babel AST, and if both contain the same properties, that node type is complete. So for RegExpLiteral, if our node type has both the pattern and flags properties, and the values of those properties are roughly equivalent in terms of the information conveyed (if not necessarily the format or syntax) then our equivalent to Babel’s RegExpLiteral is done. Since Babel takes its AST as input to generate source code, by definition its AST is “complete” in that it contains all the information needed for source code generation. If our AST contains the same information, it should be just as complete.

Comparing node types also has the advantage that we don’t need a code generation method (like Prettier) built yet. All we need is a “kitchen sink” input file that has at least one of every node type, so it generates and AST that we can compare against Babel’s. @zdenko @helixbass

…me toPlainObject to toJSON

vjeux · 2018-05-08T01:18:05Z

Some context about prettier, the way it works is the following: from an AST you generate an intermediate representation ( https://github.com/prettier/prettier/blob/master/commands.md ) that describes where things should break. Since the CoffeeScript AST is different than the JavaScript AST, you're likely going to have to rebuilt all that part from scratch.

If you use prettier, you're going to get a lot of infrastructure for free:

A CLI that supports a bunch of options
Unit testing infrastructure that has snapshot tests and indempotency checks
Editor integrations for all the editors / IDE on earth
A bunch of utils to handle comments, sub-languages (ex css inside of backticks), ...
An online playground to play with it (Not yet done for plugins but will be soon)

I didn't realize at the time but there's actually a bunch of work outside of just the core formatter that is needed to make one successful. So, we've been trying to make a plugin system where if you want to add a new language, you just have to build the printer for that language and you get the rest for free.

GeoffreyBooth · 2018-05-08T02:51:54Z

Thanks @vjeux. One of our maintainers is @lydell, so presumably we’ll have some help when it comes to the Prettier side of things 😄

One thing I’m concerned by is the idea of line breaking CoffeeScript source code in general. I know that intelligent line breaks was the original purpose of Prettier, but I think it’s fair to say that Prettier has evolved from that to now function as a best practices code reformatter in general. Since CoffeeScript is whitespace dependent, unlike JavaScript, it’s common to assume a soft word wrap and never break lines of code for the purpose of keeping under a column limit. Certainly people write objects and arrays in multiline style instead of inline when the inline version starts to get long and unwieldy, for example, but using the line continuation symbol \ or implying a continuation via things like ending a line in and are fairly uncommon. I don’t want to say there’s consensus on this view, but it’s fair to say that there isn’t consensus that CoffeeScript code should be column-limited. This is something we’ll have to address, perhaps through a configuration option, when we get closer to outputting code via Prettier.

vjeux · 2018-05-08T03:10:34Z

Prettier itself doesn't enforce any of those, this is the person writing the ast -> IR that has to make those decisions. The IR allows you to express what happens if something goes over 80 columns.

It may seem that enforcing based on the 80-column rule is not ideal but in practice, people tend to write code that way. For example if you look at the files in coffeescript/src/, they tend to fit 80 columns.

Now, this is only one input around when to break. For example in prettier we always break for if when there are braces. You also don't have to follow the column and read the input source and respect where break were, we do that for objects.

The strength of prettier infrastructure is that it gives you (IMO) the right primitives on-top of which you can build a solid pretty printer tailored for your language and the way you want code to look like.

helixbass · 2018-05-26T21:37:58Z

See comment in #4984 re progress, am able to nicely reformat tests/*.coffee (ie our whole test suite) using Prettier and the tests still pass

GeoffreyBooth · 2018-06-11T04:33:10Z

@helixbass Is this safe to merge into the ast branch? It really only covers the CLI --ast flag (and Node API equivalent) and generic AST node generation. Would anything in here conflict with your other/future PRs? I took a look at your prettier branch but there’s so much else going on there, with running Prettier itself as part of generating the output, that it didn’t seem fully equivalent.

I was thinking that if this were in, then we at least have a simple way of generating AST output that we can refer to as we merge more and more PRs in.

helixbass · 2018-06-11T11:17:45Z

@GeoffreyBooth sure, there are things that would conflict with the prettier branch in its current state, but like you said there'll have to be modifications to that branch anyway in preparation for merging whatever initial version of it around AST generation

GeoffreyBooth added 4 commits January 1, 2018 22:31

Add nodes option to Node API, that returns POJO representation of A…

08f9aaf

…ST; starting point for toPlainObject method on node classes to return the serializable plain JavaScript object representation of each node

Merge branch 'master' into nodes

48f44f4

Merge branch 'master' of github.com:jashkenas/coffeescript into nodes

d9e035e

Make --nodes also return JSON (pretty-printed, though not colorized)

f0f3a6a

GeoffreyBooth added the enhancement label Apr 29, 2018

GeoffreyBooth mentioned this pull request Apr 29, 2018

Pass lexer data to nodes #5045

Closed

GeoffreyBooth added the help wanted label May 1, 2018

GeoffreyBooth added 3 commits May 4, 2018 16:27

Merge commit '7beb63169443f5e703a5846c80aa50f9909bf0d8' into nodes

cc674dc

Alphabetize CLI flags

0332b3c

Use new ast flag to request AST, restoring prior nodes flag; rena…

50c9c28

…me toPlainObject to toJSON

GeoffreyBooth changed the base branch from master to ast June 11, 2018 04:26

GeoffreyBooth changed the title ~~WIP: Generate AST~~ AST flag/API option; generic AST output for all nodes Jun 11, 2018

GeoffreyBooth merged commit 8a25195 into jashkenas:ast Jun 12, 2018

GeoffreyBooth deleted the nodes branch June 12, 2018 02:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AST flag/API option; generic AST output for all nodes #5044

AST flag/API option; generic AST output for all nodes #5044

GeoffreyBooth commented Apr 29, 2018 •

edited

Loading

jashkenas commented Apr 30, 2018

vendethiel commented Apr 30, 2018

GeoffreyBooth commented Apr 30, 2018

GeoffreyBooth commented May 1, 2018

vjeux commented May 8, 2018

GeoffreyBooth commented May 8, 2018

vjeux commented May 8, 2018

helixbass commented May 26, 2018

GeoffreyBooth commented Jun 11, 2018

helixbass commented Jun 11, 2018

AST flag/API option; generic AST output for all nodes #5044

AST flag/API option; generic AST output for all nodes #5044

Conversation

GeoffreyBooth commented Apr 29, 2018 • edited Loading

jashkenas commented Apr 30, 2018

vendethiel commented Apr 30, 2018

GeoffreyBooth commented Apr 30, 2018

GeoffreyBooth commented May 1, 2018

vjeux commented May 8, 2018

GeoffreyBooth commented May 8, 2018

vjeux commented May 8, 2018

helixbass commented May 26, 2018

GeoffreyBooth commented Jun 11, 2018

helixbass commented Jun 11, 2018

GeoffreyBooth commented Apr 29, 2018 •

edited

Loading