@@ -337,9 +337,9 @@ b"\\x52"; br"\x52"; // \x52
337
337
> ** <sup >Lexer</sup >** \
338
338
> C_STRING_LITERAL :\
339
339
>   ;  ; ` c" ` (\
340
- >   ;  ;   ;  ; ~ \[ ` " ` ` \ ` _ IsolatedCR_ ] \
341
- >   ;  ;   ;  ; | BYTE_ESCAPE\
342
- >   ;  ;   ;  ; | UNICODE_ESCAPE\
340
+ >   ;  ;   ;  ; ~ \[ ` " ` ` \ ` _ IsolatedCR_ _ Nul _ ] \
341
+ >   ;  ;   ;  ; | BYTE_ESCAPE _ except ` \0 ` or ` \x00 ` _ \
342
+ >   ;  ;   ;  ; | UNICODE_ESCAPE _ except ` \u{0} ` , ` \u{00} ` , …, ` \u{000000} ` _ \
343
343
>   ;  ;   ;  ; | STRING_CONTINUE\
344
344
>   ;  ; )<sup >\* </sup > ` " ` SUFFIX<sup >?</sup >
345
345
@@ -372,10 +372,6 @@ starts with a `U+005C` (`\`) and continues with one of the following forms:
372
372
* The _ backslash escape_ is the character ` U+005C ` (` \ ` ) which must be
373
373
escaped in order to denote its ASCII encoding ` 0x5C ` .
374
374
375
- The escape sequences ` \0 ` , ` \x00 ` , and ` \u{0000} ` are permitted within the token
376
- but will be rejected as invalid, as C strings may not contain byte ` 0x00 ` except
377
- as the implicit terminator.
378
-
379
375
A C string represents bytes with no defined encoding, but a C string literal
380
376
may contain Unicode characters above ` U+007F ` . Such characters will be replaced
381
377
with the bytes of that character's UTF-8 representation.
@@ -398,16 +394,16 @@ c"\xC3\xA6";
398
394
>   ;  ; ` cr ` RAW_C_STRING_CONTENT SUFFIX<sup >?</sup >
399
395
>
400
396
> RAW_C_STRING_CONTENT :\
401
- >   ;  ;   ;  ; ` " ` ( ~ _ IsolatedCR_ )<sup >* (non-greedy)</sup > ` " ` \
397
+ >   ;  ;   ;  ; ` " ` ( ~ _ IsolatedCR_ _ Nul _ )<sup >* (non-greedy)</sup > ` " ` \
402
398
>   ;  ; | ` # ` RAW_C_STRING_CONTENT ` # `
403
399
404
400
Raw C string literals do not process any escapes. They start with the
405
401
character ` U+0063 ` (` c ` ), followed by ` U+0072 ` (` r ` ), followed by fewer than 256
406
402
of the character ` U+0023 ` (` # ` ), and a ` U+0022 ` (double-quote) character. The
407
- _ raw C string body_ can contain any sequence of Unicode characters and is
408
- terminated only by another ` U+0022 ` (double-quote) character, followed by the
409
- same number of ` U+0023 ` (` # ` ) characters that preceded the opening ` U+0022 `
410
- (double-quote) character.
403
+ _ raw C string body_ can contain any sequence of Unicode characters (other than
404
+ ` U+0000 ` ) and is terminated only by another ` U+0022 ` (double-quote) character,
405
+ followed by the same number of ` U+0023 ` (` # ` ) characters that preceded the
406
+ opening ` U+0022 ` (double-quote) character.
411
407
412
408
All characters contained in the raw C string body represent themselves in UTF-8
413
409
encoding. The characters ` U+0022 ` (double-quote) (except when followed by at
0 commit comments