Skip to content

Commit 028f106

Browse files
committed
Update reference for rust-lang/rust#119172.
Which moved the checking for NUL chars in C string literals earlier.
1 parent 8c77e8b commit 028f106

File tree

1 file changed

+8
-12
lines changed

1 file changed

+8
-12
lines changed

src/tokens.md

+8-12
Original file line numberDiff line numberDiff line change
@@ -337,9 +337,9 @@ b"\\x52"; br"\x52"; // \x52
337337
> **<sup>Lexer</sup>**\
338338
> C_STRING_LITERAL :\
339339
> &nbsp;&nbsp; `c"` (\
340-
> &nbsp;&nbsp; &nbsp;&nbsp; ~\[`"` `\` _IsolatedCR_]\
341-
> &nbsp;&nbsp; &nbsp;&nbsp; | BYTE_ESCAPE\
342-
> &nbsp;&nbsp; &nbsp;&nbsp; | UNICODE_ESCAPE\
340+
> &nbsp;&nbsp; &nbsp;&nbsp; ~\[`"` `\` _IsolatedCR_ _Nul_]\
341+
> &nbsp;&nbsp; &nbsp;&nbsp; | BYTE_ESCAPE _except `\0` or `\x00`_\
342+
> &nbsp;&nbsp; &nbsp;&nbsp; | UNICODE_ESCAPE _except `\u{0}`, `\u{00}`, …, `\u{000000}`_\
343343
> &nbsp;&nbsp; &nbsp;&nbsp; | STRING_CONTINUE\
344344
> &nbsp;&nbsp; )<sup>\*</sup> `"` SUFFIX<sup>?</sup>
345345
@@ -372,10 +372,6 @@ starts with a `U+005C` (`\`) and continues with one of the following forms:
372372
* The _backslash escape_ is the character `U+005C` (`\`) which must be
373373
escaped in order to denote its ASCII encoding `0x5C`.
374374

375-
The escape sequences `\0`, `\x00`, and `\u{0000}` are permitted within the token
376-
but will be rejected as invalid, as C strings may not contain byte `0x00` except
377-
as the implicit terminator.
378-
379375
A C string represents bytes with no defined encoding, but a C string literal
380376
may contain Unicode characters above `U+007F`. Such characters will be replaced
381377
with the bytes of that character's UTF-8 representation.
@@ -398,16 +394,16 @@ c"\xC3\xA6";
398394
> &nbsp;&nbsp; `cr` RAW_C_STRING_CONTENT SUFFIX<sup>?</sup>
399395
>
400396
> RAW_C_STRING_CONTENT :\
401-
> &nbsp;&nbsp; &nbsp;&nbsp; `"` ( ~ _IsolatedCR_ )<sup>* (non-greedy)</sup> `"`\
397+
> &nbsp;&nbsp; &nbsp;&nbsp; `"` ( ~ _IsolatedCR_ _Nul_ )<sup>* (non-greedy)</sup> `"`\
402398
> &nbsp;&nbsp; | `#` RAW_C_STRING_CONTENT `#`
403399
404400
Raw C string literals do not process any escapes. They start with the
405401
character `U+0063` (`c`), followed by `U+0072` (`r`), followed by fewer than 256
406402
of the character `U+0023` (`#`), and a `U+0022` (double-quote) character. The
407-
_raw C string body_ can contain any sequence of Unicode characters and is
408-
terminated only by another `U+0022` (double-quote) character, followed by the
409-
same number of `U+0023` (`#`) characters that preceded the opening `U+0022`
410-
(double-quote) character.
403+
_raw C string body_ can contain any sequence of Unicode characters (other than
404+
`U+0000`) and is terminated only by another `U+0022` (double-quote) character,
405+
followed by the same number of `U+0023` (`#`) characters that preceded the
406+
opening `U+0022` (double-quote) character.
411407

412408
All characters contained in the raw C string body represent themselves in UTF-8
413409
encoding. The characters `U+0022` (double-quote) (except when followed by at

0 commit comments

Comments
 (0)