Allow non-ASCII characters in byte literals

[RFC 69](https://github.com/rust-lang/rfcs/blob/master/text/0069-ascii-literals.md) rules that byte literals must contain ASCII characters only. This restriction has been there from the beginning, but I suggest changing it to allow any UTF-8 sequence in the byte literals.

Following the history, it seems that this restriction originally came from rust-lang/rust#4334. It states that Python only allows ASCII inside the byte literals, to make a "very clear distinction" between bytes and strings. However, the original poster also says that this restriction may not be necessary.

In Rust, the source code is guaranteed to be UTF-8. So nothing is blocking the compiler from interpreting the byte literals as UTF-8. On the other hand, Python had to be conservative on the UTF-8 assumption because it allows source code encodings other than UTF-8.

Some would say that even if the source code encoding is UTF-8, the encoding for byte literals may differ. But, we are already making an assumption on the _string literals_. They are also coerced to `str` type, which is just a UTF-8-ensured `[u8]`.

A need for "clear distinction" is not the case again, because those differences should be distinguished through the _type system_, not by allowing or forbiding some characters. `"hello"` and `b"hello"` are different, regardless of their characters("hello".) Likewise, `"안녕"` and `b"안녕"` are clearly distinguishable although their characters are same.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow non-ASCII characters in byte literals #454

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Allow non-ASCII characters in byte literals #454

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions