Skip to content

How to reverse strings that contain surrogate pairs in Dart? #38854

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vincevargadev opened this issue Oct 13, 2019 · 5 comments
Closed

How to reverse strings that contain surrogate pairs in Dart? #38854

vincevargadev opened this issue Oct 13, 2019 · 5 comments
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-core type-question A question about expected behavior or functionality

Comments

@vincevargadev
Copy link

vincevargadev commented Oct 13, 2019

I was playing with algorithms using Dart and as I actually followed TDD, I realized that my code has some limitations.

I was trying to reverse strings as part of an interview problem, but I couldn't get the surrogate pairs correctly reversed.

const simple = 'abc';
const emoji = '🍎🍏🐛';
const surrogate = '👮🏽‍♂️👩🏿‍💻';

String rev(String s) {
    return String.fromCharCodes(s.runes.toList().reversed);
}

void main() {
    print(simple);
    print(rev(simple));
    print(emoji);
    print(rev(emoji));
    print(surrogate);
    print(rev(surrogate));
}

The output:

abc
cba
🍎🍏🐛
🐛🍏🍎
👮🏽‍♂️👩🏿‍💻
💻‍🏿👩️♂‍🏽👮

You can see that the simple emojis are correctly reversed as I'm using the runes instead of just simply executing s.split('').toList().reversed.join(''); but the surrogate pairs are reversed incorrectly.

How can I reverse a string that might contain surrogate pairs using the Dart programming language?

@lrhn
Copy link
Member

lrhn commented Oct 14, 2019

Using runes is correct for surrogate pairs, but not for your problem, which is grapheme clusters consisting of more than one code point.

A surrogate pair is just a single code point stored as two 16-bit integers. Reversing a the code points of a string can be done by

String revCP(String s) => String.fromCharCodes(s.runes.toList().reversed());

That reverses the code points without reversing the integers of a single surrogate pair.

Emojis, like the ones you have problems with, consists of more than one codepoint grouped into a single grapheme cluster (many of the code points are also stored as surrogate pairs). Changing the order of these code points will change the meaning of the combined emoji.

To properly reverse such a string at the grapheme cluster level, you need to recognize the grapheme cluster boundaries. Dart does not currently expose that functionality.

@lrhn lrhn added area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-core type-question A question about expected behavior or functionality labels Oct 14, 2019
@vincevargadev
Copy link
Author

Thank you @lrhn for your answer, I think it helped me understand the issue at hand much better.

For the record, I also asked on StackOverflow, and someone recommended the grapheme_splitter package.

@lrhn
Copy link
Member

lrhn commented Oct 18, 2019

That package probably works. I'm also working on a package with slightly higher-level, but otherwise similar, functionality. I hope to release a beta soon-ish.

@mit-mit
Copy link
Member

mit-mit commented Nov 14, 2019

Please see dart-lang/language#685

@mit-mit mit-mit closed this as completed Nov 14, 2019
@vincevargadev
Copy link
Author

It's great to see this issue addressed in the Flutter Survey.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-core type-question A question about expected behavior or functionality
Projects
None yet
Development

No branches or pull requests

3 participants