Skip to content

Add digit-separators specification v1.0 #3846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 12, 2024
Merged

Add digit-separators specification v1.0 #3846

merged 6 commits into from
Jun 12, 2024

Conversation

srawlins
Copy link
Member

Add a specification for the digit-separators feature, as specified by @lrhn at #2 and at #2 (comment). I add myself as an author as I added a few other sections.


  • I’ve reviewed the contributor guide and applied the relevant portions to this PR.
Contribution guidelines:

Note that many Dart repos have a weekly cadence for reviewing PRs - please allow for some latency before initial review feedback.

@srawlins
Copy link
Member Author

srawlins commented May 29, 2024

@stereotype441 has mentioned to the analyzer team that he desires to keep better track of "possible lint rules" and "possible quick fixes" that would support a given language feature. To paraphrase, I think Paul said that these are often brought up in language team meetings, when discussing how developers could be kept on the happy path, or how to support a migration to a language feature, but they aren't really written down or otherwise tracked.

To that end, I've written possible new lint rules and quick fixes in this specification. But if this inspires frowning, if the specification MD file is not considered a good place for such commentary, I can remove those sections, and just open separate issues against analyzer, and not cross-link in this specification.


That leaves underscore, which could be the start of an identifier. Currently
`100_000` would be tokenized as "integer literal 100" followed by "identifier
_000". However, users would never write an identifier adjacent to another token
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code quote the _100

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

separators every three digits of a decimal number. Also possibly a similar rule
for hexadecimal numbers. If a developer ever uses digit separators for a
different purpose (as in separating the digits of a phone number), the rule may
not prove useful.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another lint option could be "consistent digit separators", which triggers if the digit groups do not have the same size (except the most significant I one, which can be shorter).
If there are any __ separators, the number of _ separated groups between them should also be the same, and repeatedly for higher numbers of _s.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, added!

underscore can be detected as such, and we can offer quick fixes to remove
unexpected errors (for example, `100_`, `100_e.2`, `100._00`). In a few cases,
the intention is not as straightforward, such as `100._100`, where `_100` can
be a legal name of an extension member.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's also possible to check whether it's an extension (in the same library, since it's a private name), and when it most likely isn't, the fix can be recommended.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added


A separate lint rule could encourage _consistent_ digit separators, which
triggers if the digit groups do not have the same size (except the most
significant I one, which can be shorter). If there are any `__` separators, the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this I intentional?

Suggested change
significant I one, which can be shorter). If there are any `__` separators, the
significant one, which can be shorter). If there are any `__` separators, the

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks


The only legal digit separator that is introduced with this feature is the
underscore character. If a developer attempts to use another character, for
example commas, as a separator, we may be able to detect this, and offer a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this fall into the same problem that we have when we consider , as a separating digit? How can we be sure that the user intended to use , as a separating digit instead of two expressions separated by a comma?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't. Many automated fixes are offered with the understanding that they are not the only fix, and may not be the desired fix.

I personally doubt we will offer this fix. But there might be some cases, like if the user pastes 100 lines of:

var x1 = 1,000,000;
var x2 = 2,000,000;
var x3 = 3,000,000;

into their editor, from another source text, we could offer a fix to convert the commas, because it looks more likely that they meant to write x1 = 1_000_000, since var x1 = 1, cannot be followed by an int literal (it can only be followed by an identifier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the same person pasted 100 lines of var x = foo(1,000,000); we'd be having a harder time helping them.
(Not hat we couldn't suggest 1_000_000, especially if we can see that foo takes only one argument, but it was not grammatically wrong to begin with.)

One thing that worries me is whether we can do the same fixes after the formatter has run.
If not, we may have a problem helping people who auto-format often.
If the code doesn't parse, it likely won't format either, but if it parses, and you get to var x1 = foo(1, 000, 000);, it would be sad if it's too late to help the user.

@srawlins
Copy link
Member Author

srawlins commented Jun 3, 2024

This is ready for review; dunno how you assign reviewers, @dart-lang/language-team

With the digit-separators feature, separators can be added between _digits_ of
a number literal, but nowhere else. In most error cases, the unexpected
underscore can be detected as such, and we can offer quick fixes to remove
unexpected errors (for example, `100_`, `100_e.2`, `100._00`). In a few cases,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you can't put period after e. This should be 100_e1.2

Copy link
Member

@lrhn lrhn Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is about combining multiple tokens into one number, because we guess that that's what the author intended.

The 100_ could just be one malformed numeral token, or it could be 100 followed by _.

Not much better. The 100_e1 is already invalid because of the _e, so guessing that someone intended . to be a digit separator is a pretty wild guess.
We could probably correct this to 100e12, but it's not at all obvious that that's what the author intended.

For the 100._00, we should check first that there is no _00 extension member on int. If not, we can probably fix it.

Copy link
Member

@leafpetersen leafpetersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this moving forward, thanks! Should probably get a sign off from Lasse who's thought about this a lot more, but FWIW LGTM!

@srawlins srawlins merged commit 1b70377 into main Jun 12, 2024
2 of 3 checks passed
@srawlins srawlins deleted the digit-separators branch June 12, 2024 22:22
lrhn pushed a commit that referenced this pull request Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants