Skip to content

Turn capitalization conventions into language rules for 1.0 #154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
212 changes: 212 additions & 0 deletions active/0000-significant-identifier-capitalization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
- Start Date: 2014-07-04
- RFC PR #: (leave this empty)
- Rust Issue #: (leave this empty)

# Summary

Turn our existing conventions for capitalization of identifiers into language
rules for 1.0.

The goal of this proposal is to enshrine existing practice with minimal
disruption, and to break as few programs as possible.


# Motivation

We can remove the restriction again after 1.0, but not add it.

The purpose of our existing conventions is not just consistency of style. They
are also important for aspects of semantic correctness.

Identifiers in pattern matches are interpreted as referring to an existing
`enum` variant or `static` declaration if one is in scope, and as creating a
local binding otherwise. This context-dependent meaning presents a significant
vulnerability to mistakes and accidental breakage. The convention of writing
`enum` variants and `static`s uppercase and local bindings lowercase, if it is
adhered to, eliminates this vulnerability, but it is only a convention.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there are other conventions which could also prevent such mistakes, right? So a lint seems appropriate since we are guiding the user to a particular convention, but as long as they follow some convention, they will be OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that guarantees are valuable. A convention is not a guarantee, especially not unless you can ensure that Rust code using one convention will never have to come into contact with Rust code using a different convention.


A lint encouraging use of the conventions, as we have, would be appropriate if
they were only of aesthetic significance, but if they're also important for
ensuring correctness of programs, then it should be more than a lint.

We're already paying the "cost" of reduced identifier freedom with our current
quasi-enforced conventions, but aren't extracting a corresponding benefit from
being able to rely on them as a guarantee.

Being able to distinguish classes of identifiers based on their capitalization
opens up significant room for adding new features to the language in a
backwards-compatible way. To substantiate this claim, below are a few ideas I've
thought of since I began writing this proposal. The worthiness of these ideas is
not important: they're only intended as examples of how significant
capitalization can solve problems and avoid ambiguities.

* Suppose we want to add functions with named, rather than positional,
arguments. One idea would be to play off the existing struct vs. tuple
duality and, whereas a function call with positional arguments looks like
`f(1, 2)`, function calls with named arguments would look like
`f { a: 1, b: 2 }`. Currently this would be impossible to distinguish from
a struct literal. However, if structs are always uppercase and functions are
always lowercase, then distinguishing them is possible.

* Along the same lines, with respect to current language features, it would
enable *syntactically* distinguishing between function calls and tuple struct
or enum variant literals. (I don't know where this might be useful.)

* Rather than a function call with named arguments, we could also use
`variable_name { a: foo, b: bar }` to denote functional struct update,
instead of the current more unwieldy
`StructName { a: foo, b: bar, ..variable_name }`, as seen in some functional
languages.

* To close the features gap with structural structs, suppose we want to allow
accessing the fields of tuples and tuple structs as `.0`, `.1`, `.2`, etc.
Pretty soon someone will ask: can I use a `static` `uint` declaration instead
of a literal? If struct fields are always lowercase and `static` declarations
are always uppercase, this is unambiguous, so the answer may be "yes, why
not".

* There was a recent proposal to remove the `'` sigil for lifetimes. If types
are always uppercase and lifetimes are always lowercase, they are
syntactically unambiguous even without the sigil.

Again, I'm not suggesting these are good ideas. (I think a couple of them are,
but I won't say which.) The purpose is only to illustrate how significant
capitalization can make our jobs as language designers easier.


# Detailed design

Separate identifiers into two classes: ones which must start with an uppercase
character, and ones which must start with a lowercase character.

Uppercase:

* Type parameters
* `struct` declarations
* `enum` declarations
* `type` declarations
* `trait` declarations
* `static` declarations
* `enum` variants

Lowercase:

* `mod` declarations
* `extern crate` declarations
* `fn` declarations
* `macro_rules!` declarations
* `struct` and structural `enum` variant fields
* Local bindings: `fn` and closure arguments, `let`, `match`, and `for`..`in`
bindings
* Lifetime parameters
* Lifetimes, including loop labels
* Attributes

## Grandfather clause

To avoid disruption to existing code, the names of the built-in types should
remain unchanged and remain legal. In exchange, to maintain the full separation
between type identifiers and lowercase-class identifiers, the names of the
built-in types should be disallowed as lowercase-class identifiers.

There are multiple ways to accomplish this:

1. Make a special exception such that the names of the built-in types are legal
as type names, but not as lowercase-class identifiers. (So, in effect, they
are legal only as type names.)

2. Declare that the names of the built-in types are considered to be upperclass
identifiers. Thus, they are legal as identifiers for anything in the
uppercase-class, not just types, and not for anything in the lowercase-class.

3. Turn them into keywords.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or rename them to Int, UInt, I8, U8, etc.


My preference is for 1 or 3.

As with the rule itself, this restriction could be loosened in a
backwards-compatible way in the future.

The names of the built-in types are:

* `int`, `uint`
* `i8`..`i64`
* `u8`..`u64`
* `f32`, `f64`
* `bool`
* `char`
* `str`


# Drawbacks

## Non-ASCII identifiers

The most common objection I've encountered to this idea is that we may want to
allow non-ascii identifiers in the future, and that not all alphabets have a
distinction between uppercase and lowercase characters.

I have three counterarguments, of which I consider either the first by itself or
the combination of the second and third to be sufficient, so that altogether,
they should be sufficient to counter it twice.

1. If we impose the restriction for 1.0, and later decide that we want to allow
non-ascii identifiers, we can then remove the restriction. We can't do it
the other way around.

2. We can figure out ways to accomodate such alphabets while maintaining the
distinction. For instance, we could say that characters from alphabets
without an uppercase-lowercase distinction are always considered to be
uppercase, and `_` is lowercase. Therefore lowercase-class identifiers
written in such alphabets should start with `_`. We also have a few unused
sigils now; we can get creative.

3. As mentioned in the Motivation, we *already* rely on capitalization for
aspects of program correctness, namely dealing with the `static` and `enum`
variants versus pattern bindings issue. Under these conditions, even if we
enable support for non-ascii identifiers, it's plausible that style guides
and coding conventions would disallow the use of
neither-uppercase-nor-lowercase identifiers, in order to prevent bugs. In
other words, writing identifiers in certain foreign alphabets would be
legal, but *technically inferior*. I don't think this would be a positive
situation.


## FFI

Foreign libraries may not follow our capitalization conventions. On a technical
level, `extern` functions already have a [`link_name`][link_name] attribute
which allows a different name to be specified for linking than which is used
inside Rust. On an ergonomic level, a simple convention based on two rules could
be used to make interfacing with such libraries predictable:

1. If it starts with an uppercase character in the original API, but in Rust it
would only be legal with a lowercase character, then in Rust it begins with
the lowercase version of that character.

2. If it starts with a lowercase character in the original API, but in Rust it
would only be legal with an uppercase character, then in Rust it begins with
the uppercase version of that character.

Automatic bindings generators can apply these rules automatically, and consumers
of the bindings can apply the rules in their heads when recalling the names of
things.

[link_name]: http://doc.rust-lang.org/rust.html#ffi-attributes


# Alternatives

The status quo, obviously. Refer to *Motivation*.

We could regulate the capitalization of not only the first character, but the
rest of the identifier as well, i.e. `ELEPHANT_CASE` and `CamelCase` and
`snake_case`. However, this would not provide any additional guarantees unless
we also impose further restrictions, such as requiring `ELEPHANT_CASE`
identifiers to be at least two characters long, while banning underscores and
consecutive uppercase characters in `CamelCase` identifiers. Either way, this
doesn't seem like it would pass the cost-benefit test.


# Unresolved questions

The precise formulation of the grandfather clause.