|
| 1 | +- Start Date: 2014-09-28 |
| 2 | +- RFC PR: (leave this empty) |
| 3 | +- Rust Issue: (leave this empty) |
| 4 | + |
| 5 | +# Summary |
| 6 | + |
| 7 | +Include identifiers immediately after literals in the literal token to |
| 8 | +allow future expansion, e.g. `"foo"bar` and a `1baz` are considered |
| 9 | +whole (but semantically invalid) tokens, rather than two separate |
| 10 | +tokens `"foo"`, `bar` and `1`, `baz` respectively. This allows future |
| 11 | +expansion of handling literals without risking breaking (macro) code. |
| 12 | + |
| 13 | + |
| 14 | +# Motivation |
| 15 | + |
| 16 | +Currently a few kinds of literals (integers and floats) can have a |
| 17 | +fixed set of suffixes and other kinds do not include any suffixes. The |
| 18 | +valid suffixes on numbers are: |
| 19 | + |
| 20 | + |
| 21 | +```text |
| 22 | +u, u8, u16, u32, u64 |
| 23 | +i, i8, i16, i32, i64 |
| 24 | +f32, f64 |
| 25 | +``` |
| 26 | + |
| 27 | +Most things not in this list are just ignored and treated as an |
| 28 | +entirely separate token (prefixes of `128` are errors: e.g. `1u12` has |
| 29 | +an error `"invalid int suffix"`), and similarly any suffixes on other |
| 30 | +literals are also separate tokens. For example: |
| 31 | + |
| 32 | +```rust |
| 33 | +#![feature(macro_rules)] |
| 34 | + |
| 35 | +// makes a tuple |
| 36 | +macro_rules! foo( ($($a: expr)*) => { ($($a, )+) } ) |
| 37 | + |
| 38 | +fn main() { |
| 39 | + let bar = "suffix"; |
| 40 | + let y = "suffix"; |
| 41 | + |
| 42 | + let t: (uint, uint) = foo!(1u256); |
| 43 | + println!("{}", foo!("foo"bar)); |
| 44 | + println!("{}", foo!('x'y)); |
| 45 | +} |
| 46 | +/* |
| 47 | +output: |
| 48 | +(1, 256) |
| 49 | +(foo, suffix) |
| 50 | +(x, suffix) |
| 51 | +*/ |
| 52 | +``` |
| 53 | + |
| 54 | +The compiler is eating the `1u` and then seeing the invalid suffix |
| 55 | +`256` and so treating that as a separate token, and similarly for the |
| 56 | +string and character literals. (This problem is only visible in |
| 57 | +macros, since that is the only place where two literals/identifiers can be placed |
| 58 | +directly adjacent.) |
| 59 | + |
| 60 | +This behaviour means we would be unable to expand the possibilities |
| 61 | +for literals after freezing the language/macros, which would be |
| 62 | +unfortunate, since [user defined literals in C++][cpp] are reportedly |
| 63 | +very nice, proposals for "bit data" would like to use types like `u1` |
| 64 | +and `u5` (e.g. [RFC PR 327][327]), and there are "fringe" types like |
| 65 | +[`f16`][f16], [`f128`][f128] and `u128` that have uses but are not |
| 66 | +common enough to warrant adding to the language now. |
| 67 | + |
| 68 | +[cpp]: http://en.cppreference.com/w/cpp/language/user_literal |
| 69 | +[327]: https://github.com/rust-lang/rfcs/pull/327 |
| 70 | +[f16]: http://en.wikipedia.org/wiki/Half-precision_floating-point_format |
| 71 | +[f128]: https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format |
| 72 | + |
| 73 | +# Detailed design |
| 74 | + |
| 75 | +The tokenizer will have grammar `literal: raw_literal identifier?` |
| 76 | +where `raw_literal` covers strings, characters and numbers without |
| 77 | +suffixes (e.g. `"foo"`, `'a'`, `1`, `0x10`). |
| 78 | + |
| 79 | +Examples of "valid" literals after this change (that is, entities that |
| 80 | +will be consumed as a single token): |
| 81 | + |
| 82 | +``` |
| 83 | +"foo"bar "foo"_baz |
| 84 | +'a'x 'a'_y |
| 85 | +
|
| 86 | +15u16 17i18 19f20 21.22f23 |
| 87 | +0b11u25 0x26i27 28.29e30f31 |
| 88 | +
|
| 89 | +123foo 0.0bar |
| 90 | +``` |
| 91 | + |
| 92 | +Placing a space between the letter of the suffix and the literal will |
| 93 | +cause it to be parsed as two separate tokens, just like today. That is |
| 94 | +`"foo"bar` is one token, `"foo" bar` is two tokens. |
| 95 | + |
| 96 | +The example above would then be an error, something like: |
| 97 | + |
| 98 | +```rust |
| 99 | + let t: (uint, uint) = foo!(1u256); // error: literal with unsupported size |
| 100 | + println!("{}", foo!("foo"bar)); // error: literal with unsupported suffix |
| 101 | + println!("{}", foo!('x'y)); // error: literal with unsupported suffix |
| 102 | +``` |
| 103 | + |
| 104 | +The above demonstrates that numeric suffixes could be special cased |
| 105 | +to detect `u<...>` and `i<...>` to give more useful error messages. |
| 106 | + |
| 107 | +(The macro example there is definitely an error because it is using |
| 108 | +the incorrectly-suffixed literals as `expr`s. If it was only |
| 109 | +handling them as a token, i.e. `tt`, there is the possibility that it |
| 110 | +wouldn't have to be illegal, e.g. `stringify!(1u256)` doesn't have to |
| 111 | +be illegal because the `1u256` never occurs at runtime/in the type |
| 112 | +system.) |
| 113 | + |
| 114 | +# Drawbacks |
| 115 | + |
| 116 | +None beyond outlawing placing a literal immediately before a pattern, |
| 117 | +but the current behaviour can easily be restored with a space: `123u |
| 118 | +456`. (If a macro is using this for the purpose of hacky generalised |
| 119 | +literals, the unresolved question below touches on this.) |
| 120 | + |
| 121 | +# Alternatives |
| 122 | + |
| 123 | +Don't do this, or consider doing it for adjacent suffixes with an |
| 124 | +alternative syntax, e.g. `10'bar` or `10$bar`. |
| 125 | + |
| 126 | +# Unresolved questions |
| 127 | + |
| 128 | +- Should it be the parser or the tokenizer rejecting invalid suffixes? |
| 129 | + This is effectively asking if it is legal for syntax extensions to |
| 130 | + be passed the raw literals? That is, can a `foo` procedural syntax |
| 131 | + extension accept and handle literals like `foo!(1u2)`? |
| 132 | + |
| 133 | +- Should this apply to all expressions, e.g. `(1 + 2)bar`? |
0 commit comments