Skip to content

Handle trailing commas in parser instead of scanner #14517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

adampauls
Copy link
Contributor

@adampauls adampauls commented Feb 18, 2022

Building on #14509, does the extra work to make sure that comma-separated lists can have a trailing comma only in lists delimited by parens, braces, or brackets. Pulls over a test from scala/scala#8780.

The first commit is from #14509, so you should only review the second commit in this PR.

@adampauls adampauls force-pushed the no_weird_trailing_commas branch from 89ea0ec to 24ec0b8 Compare February 18, 2022 23:48
@@ -1390,14 +1397,7 @@ object Parsers {
else
Function(params, t)
}
def funTypeArgsRest(first: Tree, following: () => Tree) = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inlined, since this is just reading a comma-separated list.

@@ -3210,13 +3209,6 @@ object Parsers {
if !idOK then syntaxError(i"named imports cannot follow wildcard imports")
namedSelector(termIdent())
}
val rest =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another comma-separated list factored out.

in.nextToken()
ts += part()
if (in.isAfterLineEnd && (in.token == OUTDENT || (expectedEnd != EMPTY && in.token == expectedEnd))) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main change here, this is moved out of the scanner, but here we no longer have to guess what lookahead token might indicate a trailing comma.

@@ -654,13 +654,6 @@ object Scanners {
insert(OUTDENT, offset)
currentRegion = r.outer
case _ =>
lookAhead()
if isAfterLineEnd
&& (token == RPAREN || token == RBRACKET || token == RBRACE || token == OUTDENT)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part that was moved to the parser.


def f =
List(1, 2, 3).map {
a => a + 1, // error: weird comma
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These used to pass.

b,
c,
= (1, 2, 3) // error
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I changed the behavior here, just wanted to make sure it was declared somewhere.

@adampauls adampauls marked this pull request as ready for review February 18, 2022 23:51
@adampauls adampauls changed the title [WIP] Handle trailing commas in parser instead of scanner Handle trailing commas in parser instead of scanner Feb 18, 2022
@som-snytt
Copy link
Contributor

You beat me to it!

adampauls added a commit to adampauls/zio-protoquill that referenced this pull request Feb 19, 2022
There is a [PR](scala/scala3#14517) up to remove bad trailing commas from dotty. This bad trailing comma is breaking the community build.
adampauls added a commit to adampauls/specs2 that referenced this pull request Feb 19, 2022
There is a [PR](scala/scala3#14517) up to remove bad trailing commas from dotty. This bad trailing comma is breaking the community build.
@adampauls
Copy link
Contributor Author

The community build breaks for bad trailing commas in two repos (specs2 and protoquill). I put up PRs to remove.

@som-snytt
Copy link
Contributor

som-snytt commented Feb 19, 2022

Were they not on latest Scala 2? The Scala 2 community build ought to have broken already. Or was it broken already?

I guess the change was only since start of pandemic.

The SIP language is, "trailing commas are only supported in comma-separated elements."

https://docs.scala-lang.org/sips/trailing-commas.html

etorreborre pushed a commit to etorreborre/specs2 that referenced this pull request Feb 19, 2022
There is a [PR](scala/scala3#14517) up to remove bad trailing commas from dotty. This bad trailing comma is breaking the community build.
@adampauls
Copy link
Contributor Author

Must have been changes to Scala 3 code made after they forked off Scala 3 parts?

@SethTisue
Copy link
Member

(The Scala 2 community build doesn't have protoquill. It does have specs2 4.x, whereas I think the Scala 3 community build has 5.x instead.)

deusaquilus pushed a commit to zio/zio-protoquill that referenced this pull request Feb 22, 2022
There is a [PR](scala/scala3#14517) up to remove bad trailing commas from dotty. This bad trailing comma is breaking the community build.
@adampauls adampauls force-pushed the no_weird_trailing_commas branch from ebfd546 to 0edf5a2 Compare February 22, 2022 17:37
Copy link
Contributor

@odersky odersky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good overall, and in the end preferable over a region-based approach. But I wonder why Scanner was not touched? Can't we delete the trailing comma handling code there now?

I have fixed the CB problem in a different PR. I leave there some instructions how to fix CB in community builds.

@adampauls
Copy link
Contributor Author

@adampauls adampauls requested a review from odersky March 16, 2022 17:16
@adampauls adampauls force-pushed the no_weird_trailing_commas branch from 0edf5a2 to 8a09cb5 Compare March 16, 2022 18:04
@adampauls
Copy link
Contributor Author

@odersky I cherry picked your CB fix, so hopefully this will go fully green. I tried merging in #14695 but a new test fails because of error recover regression. Is the plan to merge this PR first and then yours, or the other way around?

@adampauls
Copy link
Contributor Author

Not sure who to ask, maybe @SethTisue why isn't CI running on this PR?

@griggt
Copy link
Contributor

griggt commented Mar 18, 2022

Not sure who to ask, maybe @SethTisue why isn't CI running on this PR?

I think you just happened to push during a GitHub Actions service outage. A force push should probably get it running.

@SethTisue SethTisue closed this Mar 19, 2022
@SethTisue SethTisue reopened this Mar 19, 2022
@odersky
Copy link
Contributor

odersky commented Mar 20, 2022

Thinking more about it, I am not sure we want to go ahead with this. If we would go ahead, we'd have to change syntax.md (both versions) to reflect the grammar changes and also have to update all comments in the parser to reflect the same change. We have the strict rule that parser should parse what's in syntax.md and this has to be documented.

All this looks like a LOT of work, and I am not sure whether the benefits are commensurate. As far as I can see, the benefit is that an argument like { expr, } would be ruled out. Does that justify what looks like a considerable complication of grammar and parser?

We can look at it this way. If the original proposal in SIP 27 had specified extensive grammar changes, it probably would not have been adopted since the cost/benefit ratio would have been deemed too high. I believe SIP 27 was accepted in part since it proposed that the whole thing can be handled in the Lexer. The SIP proposal states this explicitly.

So changing things now by introducing all the grammar changes seems like it undermines the original proposal, and the rationale for accepting it. It would also be good to get @dwijnand 's opinion on this.

@odersky odersky assigned dwijnand and unassigned odersky Mar 20, 2022
@adampauls
Copy link
Contributor Author

Fwiw, I lost hours of confusion to the magic lines in the Scanner when I first started looking into bad error recovery behavior. The code in Scanner is small but it is not simple.

This PR combines error recovery, a refactor to re-use commaSeparated in more places, and the trailing comma fix. I think if I separate those out, you will see that the trailing comma fix is not really a not complication.

I will put up a PR for just the refactor to better use commaSeparated. At this point, will you accept the changes to error recovery, or do you plan to merge your PR first?

After that, we can see if the trailing comma fix is worth it. Fwiw, my team did complain bitterly about it because for a while, scalafmt would insert the trailing comma in every lambda with curlies. I believe that's the reason that @som-snytt removed the "weird trailing comma" from Scala 2 in the first place -- everyone was confused how it could parse in the first place.

@som-snytt
Copy link
Contributor

Earlier this yesterday evening, I did glance at the Scala 2 PR from beginning of pandemic.

It claims to be closer to the SIP. It doesn't look finicky.

There is old error recovery that is finicky, and I noted it is skippable:

Currently, doti doesn't even support case x * y => let alone case _ * y =>.

Oh but it does support it now.

The other helpful comment was

I plan to forward port to Scala 3, which is mechanically similar. I won't be able to port the name inGroupers, however, as Scala 3 already has a boring name for that method.

@som-snytt
Copy link
Contributor

I've refreshed my memory a bit before sleeping, but I'll need another look.

I see nsc says commaSeparated uses the current region:

https://github.com/scala/scala/blob/2.13.x/src/compiler/scala/tools/nsc/ast/parser/Parsers.scala#L833

and inGroupers (parens, brackets) knows which right bracket induces a skipped comma.

https://github.com/scala/scala/blob/2.13.x/src/compiler/scala/tools/nsc/ast/parser/Parsers.scala#L66

@odersky
Copy link
Contributor

odersky commented Mar 21, 2022

This PR combines error recovery, a refactor to re-use commaSeparated in more places, and the trailing comma fix. I think if I separate those out, you will see that the trailing comma fix is not really a not complication.

We can certainly evaluate that. But the grammar's have to be fixed as well. We cannot allow a deviation of Parser and grammar.

@adampauls
Copy link
Contributor Author

adampauls commented Mar 21, 2022

As far as I can tell, syntax.md was never updated to reflect trailing commas at all? I don't see anything in #3463 and when I looked at either syntax.md file in the repo, there is no reference to trailing commas, and every reference to ',' appears in a rule that only accepts non-trailing commas from what I could see. I'd happily update the the grammar to reflect the new rules, but please first let me know what version of this PR you will accept -- should I break it up first? Is this implementation acceptable?

The SIP says:

Trailing comma support is therefore restricted to only comma-separated elements that are on separate lines

and

Following Dr. Martin Odersky's suggestion, the proposal is that trailing commas are only supported in comma-separated elements that are enclosed by parentheses, square brackets or curly braces (), ], and }, respectively).

Although not entirely unambiguous, I think any reasonable reader would interpret these sentences to mean that trailing commas are only permitted in comma-separated elements, and therefore not in

{
  expr,
}

Unfortunately, I think the original SIP was simply incorrect in thinking that a Scanner-only implementation could accomplish the spec (without pushing an unreasonable amount of info into the Scanner from the Parse).

@dwijnand
Copy link
Member

I don't have a personal preference on keeping the implementation less strict and simpler or more strict and complex - I see the merits of both. But I think the fact that Scalafmt inserted commas where it shouldn't have and the compiler wasn't complex enough to reject it as a weak reason. My reading of Martin's comment is he prefers the less strict and simpler setup we have. But if I were you I'd present separate PRs and we'll see from there.

@adampauls
Copy link
Contributor Author

I have broken this up into #14741 and #14742, and will the port the rest back to #14509.

@adampauls adampauls closed this Mar 22, 2022
@odersky
Copy link
Contributor

odersky commented Mar 22, 2022

As far as I can tell, syntax.md was never updated to reflect trailing commas at all?

It did not have to since it was treated as a Lexer functionality. syntax.md works on the token stream that comes from the Lexer. The Lexer simply suppressed a trailing comma in front of a closing parenthesis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants