-
Notifications
You must be signed in to change notification settings - Fork 20
Add Chars::single(&self) -> Option<char>
for getting exactly one character
#576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
i'd prefer basing this on rust-lang/rust#81615 and do something like |
one other option that only works for this task is: let Some((0, ch)) = s.char_indices().last() else {
todo!();
}; |
These are both better options than we have. And text
.chars()
.collect_array_exact::<1>()
.get(0)
.and_then(unicode_math_class)
// -----
text
.char_indices()
.last()
.and_then(|(idx, ch)| (idx == 0).then_some(ch))
.and_then(unicode_math_class) My problem is that these have too many affordances: there are too many places where specific details matter. You can't assure yourself that the usage is correct without fully reading it. My heuristic is whether I would expect a beginner to want to add a comment clarifying what the code does for their future self, and I think only |
These are very good reasons to abstract it into a function with a clear name, but it doesn’t necessarily have to be in the standard library. For example, you can add a |
We discussed this in the @rust-lang/libs-api meeting and concluded that we would prefer having this functionality directly on the Several options are possible:
In the discussion we were in favor of accepting |
Proposal
Problem statement
In Typst, we occasionally use the pattern below for determining if a string has exactly one character in it. This code is conceptually simple, but overcomplicated and confusing for beginners and deserves a solution in the standard library.
My first time reading this example I actually went to ChatGPT for an explanation and subsequently thought, "oh, I bet I can simplify this", but wasted my time as I found that the existing code is effectively optimal if using Rust combinator idioms.
Motivating examples or use cases
Simplified—but morally equivalent—code from the Typst parser:
(Note: We can't write
.filter(|_| chars.is_empty())
becauseChars
doesn't implementExactSizeIterator
.)This function is used to determine a string's unicode math class, which is used by the parser to make a token act as an opening or closing delimiter in math text. For example, the math text
$[0, infinity)$
creates a delimiter element for typesetting—which will grow to surround its contents—by matching the square bracket and parenthesis using their unicode class designations.The unicode math class is only defined for single codepoints, so the code verifies that the string has at least one character by calling
chars.next()
, then assures the string has at most one character by checking thatchars.next()
(now the second element in the iterator) isNone
.Part of the complexity here comes from the need to create the mutable
chars
variable to call.next()
twice, forcing the user to type two Rust statements for something conceptually simple. However, rewriting this as a single method chain gives the following options, and I think we can agree that these are more confusing than before:This is not the only example, and the need for this pattern pops up throughout the compiler when we operate on strings that may have typesetting properties unique to single codepoints.
Also see the existing uses of
itertools::exactly_one
below.Solution sketch
My preferred solution would be to add a method:
Chars::single(&self) -> Option<char>
which would allow the following code:I've written a draft implementation of this and would like to publish it as a PR if decided on.
My draft implementation
Here is my rationale for some of the minutiae of this method, before discussing the main alternatives:
Why a method on
Chars
as opposed to a method onstr
, e.g.str::single_char(&self) -> Option<char>
?Chars
is already where you go to get individual characters generally. Putting it onstr
makes them compete for attention. Adding a method toChars
centralizes this need making it more discoverable.Chars
is inherently plural, so the namesingle
is unambiguous that it finds whether there is just one char.Why not return
Result<char, Option<(char, char)>>
(or an isomorphic enum)?.next()
twice to check if there is exactly one character. The internal iterator ofChars
implementsExactSizeIterator
, so we can check.is_empty()
after the first character. See my draft implementation.Err
variant isn't helpful. And for properties defined on single characters, there's no meaningful value if the string has more than one character, soNone
is expected.Result
isn't relevant for the combinator methods in the motivating example. To actually unpack the values, you would need to match on the result, which would be creating a full statement, leaving the world of combinator methods. This would be similarly ergonomic tolet mut chars = ...; match (chars.next(), chars.next()) { ... }
.Alternatives
The major alternative would be a method on
Iterator
. Indeed,Iterator::single
was already proposed and turned down because of uncertainty in the exact API guarantees and its need to be present on such a core trait.Instead, the method was added to itertools as
exactly_one(self) -> Result<Self::Item, ExactlyOneError>
.Two notable comments on the PR:
From scottmcm:
From alexcrichton:
I think these API critiques in the issue are fair: In the general iterator case, you will need to call
.next()
two times to check for there being exactly one item, and doing so may or may not consume resources or cause side-effects. Leaving this choice to user code or external libraries may be for the best when we cannot make guarantees against side-effects. However, we can make these guarantees in the specific case ofChars
.Existing uses of
.chars().exactly_one()
While the itertools method already exists, I still think it's worth moving into the standard library.
To show its utility, I did a cursory search of existing uses of
.chars().exactly_one()
on GitHub and, filtering for unique examples, got the following 25 files.File links
Advent of Code (16)
Compilers/Interpreters (5)
Misc (4)
Of these 25 files:
.unwrap()
the result (mostly advent of code)Err
variant at all.Err
variant at all, but has anallow(dead_code)
annotation on the resulting error type.Disclaimers: this is only relevant for
Chars
, this excludes private and non-github repositories, and I only did a cursory, incomplete search on this one string.However, I think these examples provide a strong argument for this method's inclusion in the standard library, and in particular for the variant with an
Option
return-type being available onChars
.Links and related work
Iterator::single
forum discussion: https://internals.rust-lang.org/t/what-do-you-think-about-iterator-single/8608/3Itertools::exactly_one
: Add exactly_one function rust-itertools/itertools#310Note that the accepted
String::into_chars
ACP has some discussion around changing whereChars
is defined and will likely conflict in git when the two changes merge.Unchanged "What happens now?" and "Possible responses" sections
What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
Second, if there's a concrete solution:
The text was updated successfully, but these errors were encountered: