Closed
Description
Strawman proposal for #34
As a minimal solution, we add one getter to the String
class, get graphemeClusters => GraphemClusters(this);
, returning an instance of a new class GrahemeClusters
, which is an iterable of GraphemeCluster
, which represents an extended grapheme cluster.
The GraphemeCluster
and GraphemeClusters
classes are defined something like:
abstract class GraphemeCluster {
/// Whether the cluster represents a sequence of Unicode scalara values.
///
/// If not, the [runes] will contain only one (invalid) code unit.
bool get isValid;
/// The code points making up this grapheme cluster.
Runes get runes;
/// Returns a string containing just this grapheme cluster.
String toString() => string.substring(start, end);
/// The length of this grapheme cluster.
///
/// Returns a number that can be added to the start index of this
/// grapheme (like one returned by [GraphemeClusters.indexOf]),
/// to produce an index just after this grapheme cluster.
int get length;
/// Whether [other] is another grapheme cluster with the same runes.
///
/// Returns `true` if [other] is a grapheme cluster, and it has the
/// same value for [isValid] and the same sequence of [runes].
bool operator==(Object other);
int get hashCode;
}
/// A view of a `String` as a sequence of grapheme clusters or invalid code units.
///
/// Many operations are based on "indices". These indices should always be
/// values returned or provided by other operations on this class, with `0`
/// being the start of the grapheme clusters, and [length] being the end.
/// The [iterator] provides access to start and end indices of the current
/// grapheme cluster, and `indexOf` or `replaceAllMapped` provides indices.
/// These indices represent *grapheme cluster boundaries*.
/// If an index is used that does not represent a grapheme cluster boundary,
/// then the behavior of the methods are unspecified.
abstract class GraphemeClusters extends Iterable<GraphemeCluster> {
const factory GrahphemeClusters(String string) = _SomeImplementationClass;
/// The extended grapheme clusters of this string.
///
/// The grapheme clusters are found progressively starting at the
/// beginning of the string. If the string contains invalid encodings,
/// they are represented by a [GraphemeCluster] with [GraphemeCluster.isValid]
/// returning false.
SliceIterator<GraphemeCluster> get iterator;
/// Whether the grapheme clusters of this contains [other].
///
/// If [start] and [end] are provided, they must be valid indices,
/// and then only the slice from start to end is checked for [other].
/// The default value of [end] is [length].
bool contains(GraphemeClusters other, [int start = 0, int end])
/// Finds the first position of [other] in the grapheme clusters of this.
///
/// Returns an integer the position of the match. This index can be used
/// as valid arguments to other methods that take indices, including
/// the [start] and [end] parameters.
///
/// With a [start], which should be a valid grapheme cluster index,
/// the search starts at that index instead of at the start of the
/// [GraphemeClusters].
/// With an [end], which should also be a valid grapheme cluster,
/// the search ends when reaching that position. The default value
/// for [end] is [length].
int indexOf(GraphemeClusters other, [int start = 0, int end]);
/// Whether [other] is a prefix of the grapeheme clusters of this.
///
/// If [start] is provided, then it must be a valid index, and
bool startsWith(GrapehemeClusters other, [int start]);
/// Whether [other] is a suffix of the grapeheme clusters of this.
bool endsWith(GrapehemeClusters other);
/// Creates a new [GraphemeClusters] containing a slice of this.
///
/// The returned clusters contain all the clusters between the [start]
/// and [end] index positions. Both positions must be valid indices
/// returned by methods on this class.
GraphemeClusters getRange(int start, [int end]);
/// The index position after the last grapheme cluster.
///
/// This is a valid index position for functions list [subclusters].
/// It represents the position after the last [GraphemeCluster] of
/// [iterator].
int get length;
bool get isEmpty => length == 0;
bool get isNotEmpty => length != 0;
/// Replaces a section of the grapheme clusters with a [replacement].
GraphemeClusters replaceRange(int start, int end,
GraphemeClusters replacement);
/// Replaces the first occurrence of [pattern] with [replacement].
GraphemeClusters replaceFirst(
GraphemeClusters pattern, GraphemeClusters replacement, [int start]);
/// ...
GraphemeClusters replaceAll(
GraphemeClusters pattern, GraphemeClusters replacement);
/// ...
GraphemeClusters replaceFirstMapped(
GraphemeClusters pattern, GraphemeClusters replace(int start, int end));
/// ...
GraphemeClusters replaceAllMapped(
GraphemeClusters pattern, GraphemeClusters replace(int start, int end))
/// Whether [other] is the same sequence of [GraphemeCluster]s.
///
/// Returns `true` if [other] is a [GraphemeClusters] and the
/// [GrapemeClusters.iterator] produces the same number of grapheme
/// clusters that are pairwise equal according to
/// [GrapehmeCluster.operator==].
bool operator==(Object other);
int get hashCode;
}
/// An iterator moving over slices of some integer-indexable collection.
abstract class SliceIterator<T> implements Iterator<T> {
// RuneIterator could implement this interface.
/// Finds the next slice.
///
/// Findes the next slice after [end], then moves [start] to the start
/// of that slice and [end] to its end.
/// If there is no next slice, [moveNext] returns false and
/// then [start] and [end] will have the same value
bool moveNext();
/// The start index of [current].
///
/// Is equal to [end] before the first call to [moveNext] and after
/// [moveNext] has returned false.
int get start;
/// The end index of [current].
int get end;
}
There are no Pattern
s on grapheme clusters. We can define a ClusterPattern if necessary, but RegExp
won't implement it.
This design has no support for:
- Normalization
- Localization
All it needs to be implemented is enough information to recognize Unicode extended grapheme clusters when scanning a string from left to right.