-
Notifications
You must be signed in to change notification settings - Fork 442
Add Identifier wrapper that strips backticks from token text #2576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Identifier wrapper that strips backticks from token text #2576
Conversation
581dea4
to
7d0faac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting up the PR 👍🏽
Sources/SwiftSyntax/Identifier.swift
Outdated
/// An abstraction for sanitized values on a token. | ||
public struct Identifier: Equatable, Sendable { | ||
/// The sanitized `text` of a token. | ||
public let name: String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to store the name as pure bytes instead of a String
. That way the identifier actually represents the raw byte name of the identifier in the source file and doesn’t do any unicode normalization.
The best option would probably to use SyntaxText
(TokenSyntax.rawText
) but SyntaxText
doesn’t own the underlying text buffer, so Identifier
would also need to keep the SyntaxArena
it was allocated in alive (RetainedSyntaxArena
).
CC @rintaro In case you have opinions / thoughts here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just chatted to @rintaro about this.
What we want is two types:
- An
@_spi(RawSyntax) public struct RawIdentifier
that stores the name of the identifier without keeping theSyntaxArena
alive. This is intended to be used in performance-critical situations that can guarantee that theSyntaxArena
stays alive and don’t want to pay the ref-counting overhead for it. - A
public struct Identifier
that wrapsRawIdentifier
and also keeps the syntax arena alive usingRetainedSyntaxArena
. Identifier should then have a computed propertyname: String
that returns a Unicode-normalized version of the identifier.String
is probably what most clients choose to use but for uses in the compiler, we need to be byte accurate because the compiler considersU+00E0
(à LATIN SMALL LETTER A WITH GRAVE) to be different thanU+0061 U+0300
(a LATIN SMALL LETTER A followed by ̀ COMBINING GRAVE ACCENT) while Swift’sString
performs Unicode normalization and considers them the same (print("\u{e0}" == "\u{61}\u{300}")
printstrue
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ahoppen thanks for following up on this!
I'm looking in to the above and wanted to clarify whether the RawIdentifier.name
needs to be a SyntaxText
or a RawSyntax
I've pushed a WIP commit which covers the above and uses a SyntaxText
which mostly works (except for the test testRawIdentifier()
which I'll look in to after the conclusion of this conversation)
However on attempting to use RawSyntax
type I'm getting a bit lost in the code from me being so new to this codebase.
Are you able to:
- clarify if we want
RawIdentifier.name
to be aSyntaxText
or aRawSyntax
- look at my latest WIP commit and provide some input on my attempts so far?
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RawIdentifier.name
should be a SyntaxText
. Sorry if that wasn’t clear from my last comment.
ac33a8c
to
3b4820e
Compare
Added a new Identifier type which contains a name property This name property contains a sanitized version of the TokenSyntax's text property which for now only consists of trimming backticks
This acts as a convenience property to convert a TokenSyntax to an Identifier
This isn't explicitly needed for this problem but it seems as though the default for all types that can conform to Sendable/Hashable should This also sets up this new type for Swift 6.0 (and the current Swift 5.10) to pass this type around safely when needed As well as allows Identifier to be a key in a dictionary which could be a common scenario
3b4820e
to
336b4b3
Compare
When trying to create an Identifier from a non-identifier token, the initializer should fail, returning nil Additionally the identifier property of the TokenSyntax should also return nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! This is going in a great direction. It’s about the design that I had in mind.
Test functions with backticks in their identifier should have them filtered out. This is a stopgap until swiftlang/swift-syntax#2576 is ready.
@adammcarter Are you planning on coming back to this? Do you mind if I put up a patch that addresses these comments? |
@adammcarter @ahoppen I took a shot at addressing the review comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking this up @plemarquand! I’ve got a few comments inline. Could you also add an entry to the release notes?
@swift-ci please test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Looks great!
@swift-ci Please test |
@swift-ci Please test Windows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add Identifier.swift to CMakeLists.txt. swift-syntax is built using CMake when it’s linked into the compiler.
@swift-ci Please test |
1 similar comment
@swift-ci Please test |
Motivation
Following the issue opened in #1936, take the code below as an example:
When wanting to pull out the character
you will have to manually strip the backslashes to get the
z
character.The backslashes are completely valid in this case but often using the Swift syntax you might want to remove the backslashes to get the sanitised name of the token for your own purposes which can introduce unnecessary boilerplate.
Solution
Thinking forward, this new
Identifier
type allows an abstraction over tokens to further sanitise/strip data out, for example, backticks (as in this PR) or future updates like:é
#
symbols such as in#if
/*
,///
or//
With the changes in this PR we can take the same example code:
And by calling
[token].identifier.name
we can getz
without the backticks.Alternatives considered
Stripping the backslashes when returning the
.text
is one solution, but the backticks are a valid part of the text and so wouldn't be an accurate way of returning the source codeTrimming the backticks as part of
trimmedDescription
is a valid and simpler solution IMO, but doesn't allow for future implementations like the above under theIdentifier
abstraction