Lexer cleanup: Split literal lexing to separate file and simplify `Cursor`. #82757

Julian-Wollersberger · 2021-03-04T11:09:19Z

The functions that lexed string and number literals were moved to a new file literals.rs, to make lib.rs smaller and more organized. I made them freestanding functions instead of methods on Cursor to have better separation of concerns: The Cursor should only be responsible for iterating char-by-char, and not for the lexer logic.

The first three commits only move code around and change the function calls from self.foo() to foo(cursor). The fourth renames the widely-used .first(), .second() and eat_while() methods to .peek(), .peek_second() and .bump_while(). The fifth commit inlines some helper functions. They were so small that it was harder to keep them in my head than to understand what the code does when inlined.

Also make them freestanding functions instead of methods.

This has better separation of concern between the lexing and the Cursor's iterator-like functionality.

rust-highfive · 2021-03-04T11:09:22Z

r? @varkor

(rust-highfive has picked a reviewer for you, use r? to override)

…hile` to `bump_while`.

…mber of things I need to keep in my head. And fix imports in tests.rs.

This one wasn't shown with `cargo check`.

petrochenkov · 2021-03-06T09:31:02Z

cc @matklad

petrochenkov · 2021-03-06T09:36:49Z

compiler/rustc_lexer/src/lib.rs

+use crate::literals::{
+    double_quoted_string, lifetime_or_char, number, raw_double_quoted_string, single_quoted_string,
+    LiteralKind,
+};


Style nit: could you avoid multi-line imports?

petrochenkov · 2021-03-06T09:46:07Z

No opinion on the move from Cursor methods to free functions, I don't really understand what is a Cursor's job and what is not. I'm interested in @matklad's opinion on this.
Same about renaming first/second to peek/peek_second.

Otherwise LGTM.

matklad · 2021-03-06T13:34:58Z

No strong feelings here!

I personally agree that using funcitons for "grammar rules" and methods for inspecting tokens is cleaner, but this is also somewhat unusual. If we do this, someone might "cleanup" them back to methods in a couple of years. See also this and this.

No opinion on helper's renaming.

Keeping all rules in a single file was intentional: ~1k lines is not that big, especially with method bodies folded in the editor, and grammars are inherently not compositional (two different rules might intersect with each other).

Helpers for grammar categories, like raw_ident or whitespace were intentional, to have "one grammar production = one function" correspondence

first_token/advance_token split was intentional, to make it easier to see the public API of the crate at a glance, and to have the signature advance_token match more closely to signatures of other "production" functions (should've named it token though).

But, as I've said, I have no strong opinion here!

petrochenkov · 2021-03-06T13:53:56Z

If we do this, someone might "cleanup" them back

Yeah, my impression about this PR is that is certainly a refactoring, but it's not obvious that it's a cleanup rather than changing the code to match a personal style of the author.

varkor · 2021-03-07T14:35:32Z

I feel I have less of a stake in the organisation of these files, and so if there isn't a strong consensus, it would be worth letting someone who is affected more directly by these changes make the final call. Perhaps if a refactoring isn't clearly an improvement, it's better not to make those changes, since at the very least it causes additional churn.

r? @matklad

matklad · 2021-03-08T08:48:50Z

It looks like there's a rough consensus that the benefits here do not out-weigh churn, so I am going to close this. I'd also like to add that the high order bit about code-cleanness here is that lexer doesn't depend on any other parts of the compiler and has a straightforward data-based interface. So it really doesn't matter much how is it structured internally.

Thanks for the pull request regardless @Julian-Wollersberger!

Julian Wollersberger added 3 commits February 13, 2021 19:33

Move lexing of number and string literals into a separate file.

59583ac

Also make them freestanding functions instead of methods.

Move the eat_*_digits() methods to literals.rs.

a534bd7

Make advance_token a freestanding function.

70a4bc8

This has better separation of concern between the lexing and the Cursor's iterator-like functionality.

rust-highfive assigned varkor Mar 4, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 4, 2021

This comment has been minimized.

Sign in to view

Julian Wollersberger added 3 commits March 4, 2021 12:22

Simplified cursor.rs a bit and renamed first to peek and `eat_w…

629e161

…hile` to `bump_while`.

Inline some helper functions that are only used once, to lower the nu…

d4336a5

…mber of things I need to keep in my head. And fix imports in tests.rs.

Address the "Hidden lifetime in path" warning.

7149a21

This one wasn't shown with `cargo check`.

Julian-Wollersberger force-pushed the lexer-cleanup branch from a4e50e3 to 7149a21 Compare March 4, 2021 11:23

petrochenkov self-assigned this Mar 4, 2021

petrochenkov reviewed Mar 6, 2021

View reviewed changes

petrochenkov removed their assignment Mar 6, 2021

rust-highfive assigned matklad and unassigned varkor Mar 7, 2021

matklad closed this Mar 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lexer cleanup: Split literal lexing to separate file and simplify `Cursor`. #82757

Lexer cleanup: Split literal lexing to separate file and simplify `Cursor`. #82757

Julian-Wollersberger commented Mar 4, 2021

rust-highfive commented Mar 4, 2021

This comment has been minimized.

petrochenkov commented Mar 6, 2021

petrochenkov Mar 6, 2021

petrochenkov commented Mar 6, 2021 •

edited

Loading

matklad commented Mar 6, 2021

petrochenkov commented Mar 6, 2021

varkor commented Mar 7, 2021

matklad commented Mar 8, 2021

Lexer cleanup: Split literal lexing to separate file and simplify Cursor. #82757

Lexer cleanup: Split literal lexing to separate file and simplify Cursor. #82757

Conversation

Julian-Wollersberger commented Mar 4, 2021

rust-highfive commented Mar 4, 2021

This comment has been minimized.

petrochenkov commented Mar 6, 2021

petrochenkov Mar 6, 2021

Choose a reason for hiding this comment

petrochenkov commented Mar 6, 2021 • edited Loading

matklad commented Mar 6, 2021

petrochenkov commented Mar 6, 2021

varkor commented Mar 7, 2021

matklad commented Mar 8, 2021

Lexer cleanup: Split literal lexing to separate file and simplify `Cursor`. #82757

Lexer cleanup: Split literal lexing to separate file and simplify `Cursor`. #82757

petrochenkov commented Mar 6, 2021 •

edited

Loading