Skip to content

rustc_lexer's definition of ids are more general than lang ref's spec #85809

Closed
@osa1

Description

@osa1

This is how Rust identifiers' lexical syntax is defined: https://doc.rust-lang.org/reference/identifiers.html
This is how the lexer for Rust identifiers is implemented:

/// True if `c` is valid as a first character of an identifier.
/// See [Rust language reference](https://doc.rust-lang.org/reference/identifiers.html) for
/// a formal definition of valid identifier name.
pub fn is_id_start(c: char) -> bool {
// This is XID_Start OR '_' (which formally is not a XID_Start).
// We also add fast-path for ascii idents
('a'..='z').contains(&c)
|| ('A'..='Z').contains(&c)
|| c == '_'
|| (c > '\x7f' && unicode_xid::UnicodeXID::is_xid_start(c))
}
/// True if `c` is valid as a non-first character of an identifier.
/// See [Rust language reference](https://doc.rust-lang.org/reference/identifiers.html) for
/// a formal definition of valid identifier name.
pub fn is_id_continue(c: char) -> bool {
// This is exactly XID_Continue.
// We also add fast-path for ascii idents
('a'..='z').contains(&c)
|| ('A'..='Z').contains(&c)
|| ('0'..='9').contains(&c)
|| c == '_'
|| (c > '\x7f' && unicode_xid::UnicodeXID::is_xid_continue(c))
}
/// The passed string is lexically an identifier.
pub fn is_ident(string: &str) -> bool {
let mut chars = string.chars();
if let Some(start) = chars.next() {
is_id_start(start) && chars.all(is_id_continue)
} else {
false
}
}

The specification says it should start with ASCII alphabetic and continue with ASCII alphanumeric or underscore. But the implementation uses http://www.unicode.org/reports/tr31/#Default_Identifier_Syntax which is much more general than that as far as I understand.

I think one of code or lang ref should be updated, but I'm not sure which one.

(I didn't check lexing for other tokens, it might be useful to compare others with the language reference's definitions too)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions