-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[stdlib] Implement native grapheme breaking for String #37864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci Please benchmark |
@swift-ci Please test |
This comment has been minimized.
This comment has been minimized.
@swift-ci Please benchmark |
This comment has been minimized.
This comment has been minimized.
753875e
to
17c277f
Compare
17c277f
to
c05e787
Compare
@swift-ci please benchmark |
This comment has been minimized.
This comment has been minimized.
c05e787
to
716d6f6
Compare
716d6f6
to
9004029
Compare
9004029
to
effbcb2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a quick preliminary review, so many of my suggestions are vague and might not actually make sense. But, hopefully it gives you something to chew on or encourages some refactoring or clarifying.
Looking good so far though!
effbcb2
to
aa8da8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick review. This is looking a lot cleaner, thanks for scrapping the iterator pattern!
Refactoring out state helps with the clarity quite a bit. It also shows that index
does seem like a different bit of state than the others, so we might pull that out and put it back on the walker or somewhere. Something to consider.
aa8da8f
to
c385938
Compare
@swift-ci please benchmark |
c385938
to
b3bb9ac
Compare
@swift-ci please benchmark |
This comment has been minimized.
This comment has been minimized.
@swift-ci please benchmark |
This comment has been minimized.
This comment has been minimized.
use an Iterator approach remove Iterator conformance
more comments addressed fix crlf bug
2d04fd3
to
58fb22c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it's hasty, but I wanted to get you some feedback quickly.
Parameterize nextBoundary and previousBoundary
58fb22c
to
85352c2
Compare
@swift-ci please test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very close!
For landing, I think we can simplify a lot of the state that's passed around and mutated, especially when going backwards.
For future work, I'm interested in splitting this between forwards and backwards a little (since I think that will actually simplify the logic), but I don't want to hold up landing this for that. Those comments are written with "Future note:" at the front.
// | = We found our starting .extendedPictographic letting us | ||
// know that we are in an emoji sequence so our initial | ||
// break question is answered as NO. | ||
internal func checkIfInEmojiSequence( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future note: One thing that isn't clear to me is how this is different than the walking-backwards loop we're doing anyways inside of previousBoundary
.
// GB11 | ||
case (.zwj, .extendedPictographic): | ||
if state.isBackwards { | ||
checkIfInEmojiSequence(&state, index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future note: If we're going backwards, seems like the return value is "yes, unless you're in an emoji sequence".
// state variable to false after every decision of 'shouldBreak'. If we | ||
// happen to see a rhs .extend or .zwj, then it's a signal that we should | ||
// continue treating the current grapheme cluster as an emoji sequence. | ||
var enterEmojiSequence = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future note: Is there a way to simplify this logic? It seems to only be essential to forwards-state.
@swift-ci please test |
@swift-ci please benchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a couple places for comments. I think we should try to merge this soon. There's more future work we could do, but this is great for now!
let cocoa = _object.cocoaObject | ||
// GB11 | ||
case (.zwj, .extendedPictographic): | ||
if isBackwards { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment describing what we're doing and why
// GB12 & GB13 | ||
case (.regionalIndicator, .regionalIndicator): | ||
if isBackwards { | ||
return countRIs(index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a comment, or a more descriptive name like checkPairedRIs.
Performance (x86_64): -O
Code size: -O
Performance (x86_64): -Osize
Code size: -Osize
Performance (x86_64): -Onone
Code size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
Build failed |
@swift-ci please test linux platform |
Are the |
I have no idea why those benchmarks are showing as a regression in this patch (and it seems they showed a 3x improvement with the normalization patch). Considering they don't even touch strings at all, I'm inclined to say they are not an issue, but I could be wrong. |
Here are some benchmarks that I ran locally with the Performance (arm64): -O
Code size: -OPerformance (arm64): -Osize
Code size: -OsizePerformance (arm64): -Onone
Code size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
This is a draft and will see many updates, but wanted to put this up to get some benchmark readings.
Resolves: rdar://52194063, https://bugs.swift.org/browse/SR-9423