-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Lexer cleanup: Split literal lexing to separate file and simplify Cursor
.
#82757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lexer cleanup: Split literal lexing to separate file and simplify Cursor
.
#82757
Conversation
Also make them freestanding functions instead of methods.
This has better separation of concern between the lexing and the Cursor's iterator-like functionality.
r? @varkor (rust-highfive has picked a reviewer for you, use r? to override) |
This comment has been minimized.
This comment has been minimized.
…hile` to `bump_while`.
…mber of things I need to keep in my head. And fix imports in tests.rs.
This one wasn't shown with `cargo check`.
a4e50e3
to
7149a21
Compare
cc @matklad |
use crate::literals::{ | ||
double_quoted_string, lifetime_or_char, number, raw_double_quoted_string, single_quoted_string, | ||
LiteralKind, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: could you avoid multi-line imports?
Otherwise LGTM. |
No strong feelings here! I personally agree that using funcitons for "grammar rules" and methods for inspecting tokens is cleaner, but this is also somewhat unusual. If we do this, someone might "cleanup" them back to methods in a couple of years. See also this and this. No opinion on helper's renaming. Keeping all rules in a single file was intentional: ~1k lines is not that big, especially with method bodies folded in the editor, and grammars are inherently not compositional (two different rules might intersect with each other). Helpers for grammar categories, like first_token/advance_token split was intentional, to make it easier to see the public API of the crate at a glance, and to have the signature But, as I've said, I have no strong opinion here! |
Yeah, my impression about this PR is that is certainly a refactoring, but it's not obvious that it's a cleanup rather than changing the code to match a personal style of the author. |
I feel I have less of a stake in the organisation of these files, and so if there isn't a strong consensus, it would be worth letting someone who is affected more directly by these changes make the final call. Perhaps if a refactoring isn't clearly an improvement, it's better not to make those changes, since at the very least it causes additional churn. r? @matklad |
It looks like there's a rough consensus that the benefits here do not out-weigh churn, so I am going to close this. I'd also like to add that the high order bit about code-cleanness here is that lexer doesn't depend on any other parts of the compiler and has a straightforward data-based interface. So it really doesn't matter much how is it structured internally. Thanks for the pull request regardless @Julian-Wollersberger! |
The functions that lexed string and number literals were moved to a new file
literals.rs
, to makelib.rs
smaller and more organized. I made them freestanding functions instead of methods onCursor
to have better separation of concerns: TheCursor
should only be responsible for iterating char-by-char, and not for the lexer logic.The first three commits only move code around and change the function calls from
self.foo()
tofoo(cursor)
. The fourth renames the widely-used.first()
,.second()
andeat_while()
methods to.peek()
,.peek_second()
and.bump_while()
. The fifth commit inlines some helper functions. They were so small that it was harder to keep them in my head than to understand what the code does when inlined.