Description
Here is a concrete proposal for #3947 (comment) .
Background
All Zig code is always encoded in UTF-8, and this proposal does not change that.
This proposal does not change the interpretation of ASCII codepoints anywhere in Zig code.
The only non-ascii codepoints with special handling in Zig before this proposal are: U+0085 (NEL), U+2028 (LS), U+2029 (PS). This proposal does not change the interpretation of these codepoints; they are not allowed in identifiers.
Proposal
Zig's current lexical rule for identifiers is:
IDENTIFIER
<- !keyword ("c" !["\\] / [A-Zabd-z_]) [A-Za-z0-9_]* skip
/ "@\"" string_char* "\"" skip
This proposal adds the codepoints listed in the table below to both the ranges [A-Zabd-z_]
and [A-Za-z0-9_]
in the above rule.
00A0
00A8
00AA
00AD
00AF
00B2..00B5
00B7..00BA
00BC..00BE
00C0..00D6
00D8..00F6
00F8..200D
202A..202F
203F..2040
2054
205F..218F
2460..24FF
2776..2793
2C00..2DFF
2E80..3000
3004..3007
3021..302F
3031..D7FF
F900..FD3D
FD40..FDCF
FDF0..FE44
FE47..FFFD
10000..1FFFD
20000..2FFFD
30000..3FFFD
40000..4FFFD
50000..5FFFD
60000..6FFFD
70000..7FFFD
80000..8FFFD
90000..9FFFD
A0000..AFFFD
B0000..BFFFD
C0000..CFFFD
D0000..DFFFD
E0000..EFFFD
Explanation
This set of codepoints was determined by following the recommendation here: https://unicode.org/reports/tr31/#Immutable_Identifier_Syntax . Specifically, this is the set of all characters except characters meeting any of these criteria:
- Pattern_White_Space=True
- Pattern_Syntax=True
- General_Category=Private_Use, Surrogate, or Control
- Noncharacter_Code_Point=True
Unicode Character Data version 5.2.0 was used to generate this list, but this list can remain stable forever despite future versions to Unicode Character Data, as per the recommendation and discussion in tr31 linked above. (EDIT: @daurnimator pointed out that this is many major versions behind, but even using the latest version 12.1.0, the list of codepoints in this proposal is identical.)
The code I used to generate the above set of codepoints can be found here: https://github.com/ziglang/zig/blob/6f8e2fad94fde6c9a8c4ca52d964d0616690ee4c/tools/gen_id_char_table.py