Skip to content

Commit d8d22fd

Browse files
committed
RFC to restrict placing an identifier after a literal.
1 parent bd96cb3 commit d8d22fd

File tree

1 file changed

+133
-0
lines changed

1 file changed

+133
-0
lines changed
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
- Start Date: 2014-09-28
2+
- RFC PR: (leave this empty)
3+
- Rust Issue: (leave this empty)
4+
5+
# Summary
6+
7+
Include identifiers immediately after literals in the literal token to
8+
allow future expansion, e.g. `"foo"bar` and a `1baz` are considered
9+
whole (but semantically invalid) tokens, rather than two separate
10+
tokens `"foo"`, `bar` and `1`, `baz` respectively. This allows future
11+
expansion of handling literals without risking breaking (macro) code.
12+
13+
14+
# Motivation
15+
16+
Currently a few kinds of literals (integers and floats) can have a
17+
fixed set of suffixes and other kinds do not include any suffixes. The
18+
valid suffixes on numbers are:
19+
20+
21+
```text
22+
u, u8, u16, u32, u64
23+
i, i8, i16, i32, i64
24+
f32, f64
25+
```
26+
27+
Most things not in this list are just ignored and treated as an
28+
entirely separate token (prefixes of `128` are errors: e.g. `1u12` has
29+
an error `"invalid int suffix"`), and similarly any suffixes on other
30+
literals are also separate tokens. For example:
31+
32+
```rust
33+
#![feature(macro_rules)]
34+
35+
// makes a tuple
36+
macro_rules! foo( ($($a: expr)*) => { ($($a, )+) } )
37+
38+
fn main() {
39+
let bar = "suffix";
40+
let y = "suffix";
41+
42+
let t: (uint, uint) = foo!(1u256);
43+
println!("{}", foo!("foo"bar));
44+
println!("{}", foo!('x'y));
45+
}
46+
/*
47+
output:
48+
(1, 256)
49+
(foo, suffix)
50+
(x, suffix)
51+
*/
52+
```
53+
54+
The compiler is eating the `1u` and then seeing the invalid suffix
55+
`256` and so treating that as a separate token, and similarly for the
56+
string and character literals. (This problem is only visible in
57+
macros, since that is the only place where two literals/identifiers can be placed
58+
directly adjacent.)
59+
60+
This behaviour means we would be unable to expand the possibilities
61+
for literals after freezing the language/macros, which would be
62+
unfortunate, since [user defined literals in C++][cpp] are reportedly
63+
very nice, proposals for "bit data" would like to use types like `u1`
64+
and `u5` (e.g. [RFC PR 327][327]), and there are "fringe" types like
65+
[`f16`][f16], [`f128`][f128] and `u128` that have uses but are not
66+
common enough to warrant adding to the language now.
67+
68+
[cpp]: http://en.cppreference.com/w/cpp/language/user_literal
69+
[327]: https://github.com/rust-lang/rfcs/pull/327
70+
[f16]: http://en.wikipedia.org/wiki/Half-precision_floating-point_format
71+
[f128]: https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format
72+
73+
# Detailed design
74+
75+
The tokenizer will have grammar `literal: raw_literal identifier?`
76+
where `raw_literal` covers strings, characters and numbers without
77+
suffixes (e.g. `"foo"`, `'a'`, `1`, `0x10`).
78+
79+
Examples of "valid" literals after this change (that is, entities that
80+
will be consumed as a single token):
81+
82+
```
83+
"foo"bar "foo"_baz
84+
'a'x 'a'_y
85+
86+
15u16 17i18 19f20 21.22f23
87+
0b11u25 0x26i27 28.29e30f31
88+
89+
123foo 0.0bar
90+
```
91+
92+
Placing a space between the letter of the suffix and the literal will
93+
cause it to be parsed as two separate tokens, just like today. That is
94+
`"foo"bar` is one token, `"foo" bar` is two tokens.
95+
96+
The example above would then be an error, something like:
97+
98+
```rust
99+
let t: (uint, uint) = foo!(1u256); // error: literal with unsupported size
100+
println!("{}", foo!("foo"bar)); // error: literal with unsupported suffix
101+
println!("{}", foo!('x'y)); // error: literal with unsupported suffix
102+
```
103+
104+
The above demonstrates that numeric suffixes could be special cased
105+
to detect `u<...>` and `i<...>` to give more useful error messages.
106+
107+
(The macro example there is definitely an error because it is using
108+
the incorrectly-suffixed literals as `expr`s. If it was only
109+
handling them as a token, i.e. `tt`, there is the possibility that it
110+
wouldn't have to be illegal, e.g. `stringify!(1u256)` doesn't have to
111+
be illegal because the `1u256` never occurs at runtime/in the type
112+
system.)
113+
114+
# Drawbacks
115+
116+
None beyond outlawing placing a literal immediately before a pattern,
117+
but the current behaviour can easily be restored with a space: `123u
118+
456`. (If a macro is using this for the purpose of hacky generalised
119+
literals, the unresolved question below touches on this.)
120+
121+
# Alternatives
122+
123+
Don't do this, or consider doing it for adjacent suffixes with an
124+
alternative syntax, e.g. `10'bar` or `10$bar`.
125+
126+
# Unresolved questions
127+
128+
- Should it be the parser or the tokenizer rejecting invalid suffixes?
129+
This is effectively asking if it is legal for syntax extensions to
130+
be passed the raw literals? That is, can a `foo` procedural syntax
131+
extension accept and handle literals like `foo!(1u2)`?
132+
133+
- Should this apply to all expressions, e.g. `(1 + 2)bar`?

0 commit comments

Comments
 (0)