Skip to content

char::{to_uppercase, to_lowercase} are broken and should not be marked stable #25729

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
niconii opened this issue May 23, 2015 · 1 comment
Closed
Labels
A-stability Area: `#[stable]`, `#[unstable]` etc.

Comments

@niconii
Copy link
Contributor

niconii commented May 23, 2015

The description of char::to_uppercase reads:

The case-folding performed is the common or simple mapping: it maps one Unicode codepoint to its uppercase equivalent according to the Unicode database. The additional [SpecialCasing.txt] is not yet considered here, but the iterator returned will soon support this form of case folding.

Similarly, char::to_lowercase reads:

The case-folding performed is the common or simple mapping. See to_uppercase() for references and more information.

Right now, char::to_uppercase and char::to_lowercase disregard graphemes whose lowercase codepoints do not map one-to-one with their uppercase codepoints. For example, U+FB02 fl LATIN SMALL LIGATURE FL, a single codepoint, maps to two characters: 'F' and 'L'. Yet, when we try converting it to uppercase in Rust...

fn main() {
    let uppercase_flavor = "flavor".chars()
        .flat_map(char::to_uppercase)
        .collect::<String>();

    println!("{}", uppercase_flavor);
}

...rather than the expected "FLAVOR", we get "flAVOR", an incorrect result.

As the description of char::to_uppercase explains, this behavior is expected to change. However, currently char::to_uppercase, char::to_lowercase, and their associated iterators std::char::ToUppercase and std::char::ToLowercase are all marked stable, yet fixing this behavior would be a breaking change.

@steveklabnik steveklabnik added A-libs A-stability Area: `#[stable]`, `#[unstable]` etc. labels May 25, 2015
@alexcrichton
Copy link
Member

Thanks for the report! This sort of change is definitely planned, however, and has yet to be implemented. The switch from returning just a plain char to returning an iterator of characters reflects this. A minor update such as this will not be considered a breaking change, especially because of the way the documentation is currently worded. As a result I'm going to close this issue for now in favor of #25800, which reflects the current plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-stability Area: `#[stable]`, `#[unstable]` etc.
Projects
None yet
Development

No branches or pull requests

3 participants