You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a minimal solution, we add one getter to the String class, get graphemeClusters => GraphemClusters(this);, returning an instance of a new class GrahemeClusters, which is an iterable of GraphemeCluster, which represents an extended grapheme cluster.
Unlike #49, this approach does not use integer indices, but instead relies on mutable index objects,
the GraphemeClusterIterator objects. This design is highly speculative. There is likely many things that can be done to make it better.
The classes are defined something like:
/// A single grapheme cluster, or an invalid code unit./// /// Represents a single grapheme cluster as a sequence of Unicode/// scalar values./// If the source contains things other than valid Unicode scalar/// values, each code unit that is not part of a valid scalar value/// gets represented as a single [GraphemeCluster] with [isValid] being/// false.abstractclassGraphemeCluster {
/// Whether the cluster represents a sequence of Unicode scalar values. /// /// If not, the [runes] will contain only one (invalid) code unit.boolget isValid;
/// The code points making up this grapheme cluster.Runesget runes;
/// Returns a string containing just this grapheme cluster.StringtoString();
/// Whether [other] is another grapheme cluster with the same runes. /// /// Returns `true` if [other] is a grapheme cluster, and it has the /// same value for [isValid] and the same sequence of [runes].booloperator==(Object other);
intget hashCode;
}
/// A sequence of [GraphemeCluster]s.////// Also provides operations to work at the [GraphemeClusters] level./// /// Represents all positions in a [GraphemeClusters] as iterable by/// a [GraphemeClusterIterator]. Operations like [indexOf] returns/// a [GraphemeClusterIterator] which can be used to find more/// instances of the pattern.abstractclassGraphemeClustersimplementsIterable<GraphemeCluster> {
// Iterate over all the grapheme clusters.GraphemeClusterIteratorget iterator;
/// Whether this is an empty sequence of grapheme clusters.boolget isEmpty;
/// Whether this is not an empty sequence of grapheme clusters.boolget isNotEmpty;
/// Returns an iterator pointing before the first occurence of [clusters]. /// /// The iterator has no [GraphemeClusterIterator.current]. /// If [clusters] does not occur in this, then the returned iterator is /// at the end of the grapheme cluster sequence.GraphemeClusterIteratorindexOf(GraphemeClusters clusters) =>
iterator..moveTo(clusters);
/// Returns an iterator pointing after the first occurence of [clusters]. /// /// The iterator has no [GraphemeClusterIterator.current]. /// If [clusters] does not occur in this, then the returned iterator is /// at the end of the grapheme cluster sequence. /// /// Equivalent to `gc.indexOf(clusters)..skipPrefix(clusters)`.GraphemeClusterIteratorindexAfter(GraphemeClusters clusters) =>
iterator..moveAfter(clusters);
/// Returns an iterator pointing to the last occurence of [clusters]. /// /// The iterator has no [GraphemeClusterIterator.current]. /// If [clusters] does not occur in this, then the returned iterator is /// at the end of the grapheme cluster sequence.GraphemeClusterIteratorlastIndexOf(GraphemeClusters clusters) =>
iterator..moveToLast(clusters);
/// Returns an iterator pointing after the last occurence of [clusters]. /// /// The iterator has no [GraphemeClusterIterator.current]. /// If [clusters] does not occur in this, then the returned iterator is /// at the end of the grapheme cluster sequence.GraphemeClusterIteratorlastIndexAfter(GraphemeClusters clusters) =>
iterator..moveToLast(clusters);
/// Whether this sequence of grapheme clusters contains [clusters].boolcontainsAll(GraphemeClusters clusters) => iterator.moveAfter(clusters);
/// Whether this sequence starts with [clusters].boolstartsWith(GraphemeClusters clusters) => iterator.startsWith(clusters);
/// Whether this sequence ends with [clusters];boolendsWith(GraphemeClusters clusters) {
var iterator =this.iterator;
if (!iterator.moveAfterLast(clusters)) returnfalse;
return!iterator.moveNext();
}
/// Replaces the first occurrence of [pattern] with [replacement]. /// /// Returns a new grapheme cluster sequence where the first occurences of [pattern] /// has been replaced by [replacement].GraphemeClustersreplaceFirst(GraphemeClusters pattern, GraphemeClusters replacement);
/// Replaces all occurrences of [pattern] with [replacement]. /// /// Returns a new grapheme cluster sequence where the occurences of [pattern] /// has been replaced by [replacement]. If occurences of [pattern] overlap, /// the first occurence is replaced, and further occurences are only searched /// for after the end of the replacement.GraphemeClustersreplaceAll(GraphemeClusters pattern, GraphemeClusters replacement);
/// Splits on [pattern] and emits the contents around the patterns. /// /// If [pattern] does not occur in this, the returned iterable contains only /// one element with the same grapheme clusters as this. /// Otherwise the emitted grapheme cluster sequences are those before the /// first occurence of [pattern], after the last occurence and between /// two consequtive occurences.Iterable<GraphemeClusters> split(GraphemeClusters pattern);
/// Returns a string containing the same runes as this. /// /// If this is backed by a string, that string will be returned again. /// For another implementation of [GraphemeCluster] that is not backed /// directly by a string, a new string is created which contains /// the same valid runes as the grapheme clusters of this, /// and with the same invalid code units *if possible*. /// If [ensureValid] is true, then invalid code units are replaced by /// U+FFFD. /// Otherwise, if an invalid code point cannot be represented by the /// string, then this call throws.StringtoString({bool ensureValid =false});
}
/// An extended [Iterator] over [GraphemeCluster]./// /// Allows operations on the remaining grapheme clusters, /// similar to the operations of [GraphemeClusters]./// /// A [GraphemeClusterIterator] represents either a single grapheme cluster,/// or a position between grapheme clusters. In the latter case, [current]/// is null. The extra operations that work on multiple grapheme clusters/// will typically end up with a position between two grapheme clusters.abstractclassGraphemeClusterIteratorimplementsIterator<GraphemeCluster> {
/// The end point of this iteration. /// /// Is initially an iterator at the end of the grapheme cluster sequence. /// /// Using [resetEnd] allows a user of this iterator to set an end /// point for iteration of this iterator.GraphemeClusterIteratorget end;
/// Sets the [end] position of this iterator. /// /// Calls [setEnd] with the current iterator, after remembering /// its current position. /// When [setEnd] returns, the current position of the iterator /// (the position before its current grapheme cluster or the current /// position between two grapheme clusters) becomes the new end position /// of this iterator, and the rembered current position is restored. /// /// This can be used to limit iteration to a range, e.g.: /// ```dart /// var gcs = "<foo>".graphemeClusters; /// var iter = gcs.moveAfter("<".graphemeClusters) /// ..resetEnd((i) => i.moveBeforeLast(">".graphemeClusters)); /// ``` /// This produces an iterator that will only iterate from after the first /// `<` to before the last `>` (the grapheme clusters of "foo"). /// /// If [setEnd] is omitted, the [end] is set to the end of the /// [GraphemeClusters] being iterated. /// /// Returns true if the end positions changed and false if it did not.boolresetEnd([voidFunction(GraphemeClusterIterator) setEnd]);
/// A [GraphemeClusters] containing the remainder of the clusters of this. /// /// If [current] is not null, it is included in the remainder, as are all /// values of [current] reachable by calling [moveNext] until it returns false.GraphemeClustersget graphemeClusters;
/// Iterate until [clusters] is a prefix of the remaining grapheme clusters. /// /// Repeatedly does a [moveNext] until either that returns false, in which case /// this call also returns false, or until the remaining clusters start /// with [clusters]. At that point, [current] will be null, and the next call /// to [moveNext] will select the first grapheme cluster.boolmoveTo(GraphemeClusters clusters, {int skip =0});
/// Iterate until [clusters] is a suffixed of the iterated grapheme clusters. /// /// Repeatedly does a [moveNext] until either that returns false, in which case /// this call also returns false, or until the already iterated clusters end /// with [clusters]. At that point, [current] will be null, and the next call /// to [moveNext] will select the next remaining grapheme cluster.boolmoveAfter(GraphemeClusters clusters, {int skip =0});
/// Iterate until after the last occurence of [clusters]. /// /// If [skip] is provided, only iterate until after the [skip]+1th /// last occurence of [clusters]. /// /// If there are not enoughboolmoveToLast(GraphemeClusters clusters, {int skip =0});
/// Find the last occurrence of [clusters] in this. /// /// If [skip] is provided, /// instead find the [skip]th last occurence of [clusters]. /// /// Then set the current position of this iterator to just /// after that occurence, with no [current] being selected. /// /// If there are not [skip] + 1 occurrences of [clusters] /// in the remaining grapheme clusters of this iterator, /// this iteration will end and return null, as if by /// repeatedly calling [moveNext].boolmoveAfterLast(GraphemeClusters clusters, {int skip =0});
/// Whether [cluster] is a prefix of the remaining grapheme clusters. /// /// The remaining grapheme clusters includes [current] if [current] is /// not null. Does not move the current position.boolstartsWith(GraphemeClusters clusters);
/// If this iterator [startsWith][clusters], move to after it. /// /// Returns true if [clusters] was skipped. After this, /// [current] is unset.boolskipPrefix(GraphemeClusters clusters);
/// Resets the position to before the [current] grapheme cluster. /// /// If [current] is null, then nothing happens. /// Otherwise sets [current] to null, /// and the next call to [moveNext] will select [current] again. /// /// If [skipCurrent] is true, the position is reset to *after* the [current] /// grapheme cluster. This sets [current] to null, and the next call /// to [moveNext] will select the next grapheme cluster after [current].voidreset({bool skipCurrent =false});
/// Creates a copy of this iterator. /// /// The copy is at the exact same position, but is otherwise unrelated to /// this iterator. The two iterators can be changed independently.GraphemeClusterIteratorclone();
// Make these return GraphemeClusterIterator too.GraphemeClusterIteratorskip(int count);
GraphemeClusterIteratortake(int count);
GraphemeClusterIteratorskipWhile(booltest(GraphemeClusters));
GraphemeClusterIteratortakeWhile(booltest(GraphemeClusters));
}
There are no Patterns on grapheme clusters. We can define a ClusterPattern if necessary, but RegExp won't implement it.
This design has no support for:
Normalization
Localization
All it needs to be implemented is enough information to recognize Unicode extended grapheme clusters when scanning a string from left to right.
The API is designed so that you never give an iterator to a function on GraphemeClusters, so you never need to check if it's an iterator for that object. Instead you always use the iterator itself to do the operation.
Since indices are no longer exposed as integers, you cannot do arithmetic operations on them. That includes comparisons, so there is no way to figure out whether two grapheme cluster iterators are iterating the same GraphemeClusters object, and if they are, which is further ahead.
The text was updated successfully, but these errors were encountered:
lrhn
added
the
feature
Proposed language feature that solves one or more problems
label
Oct 17, 2018
Strawman proposal for #34
As a minimal solution, we add one getter to the
String
class,get graphemeClusters => GraphemClusters(this);
, returning an instance of a new classGrahemeClusters
, which is an iterable ofGraphemeCluster
, which represents an extended grapheme cluster.Unlike #49, this approach does not use integer indices, but instead relies on mutable index objects,
the
GraphemeClusterIterator
objects. This design is highly speculative. There is likely many things that can be done to make it better.The classes are defined something like:
There are no
Pattern
s on grapheme clusters. We can define a ClusterPattern if necessary, butRegExp
won't implement it.This design has no support for:
All it needs to be implemented is enough information to recognize Unicode extended grapheme clusters when scanning a string from left to right.
The API is designed so that you never give an iterator to a function on
GraphemeClusters
, so you never need to check if it's an iterator for that object. Instead you always use the iterator itself to do the operation.Since indices are no longer exposed as integers, you cannot do arithmetic operations on them. That includes comparisons, so there is no way to figure out whether two grapheme cluster iterators are iterating the same
GraphemeClusters
object, and if they are, which is further ahead.The text was updated successfully, but these errors were encountered: