Skip to content

Getting the first character of a string is way too verbose #24617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Hixie opened this issue Oct 17, 2015 · 5 comments
Closed

Getting the first character of a string is way too verbose #24617

Hixie opened this issue Oct 17, 2015 · 5 comments
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-core type-enhancement A request for a change that isn't a bug

Comments

@Hixie
Copy link
Contributor

Hixie commented Oct 17, 2015

As far as I can tell, this is how you truncate a string to its first character:

s = new String.fromCharCode(s.runes.first)

That's far too verbose. Ideally it'd be something like s = s[0] or s = s.first.

Ideally, we'd drop the entire UTF-16 encoding thing and just have String be a pure-Unicode class.

@kevmoo kevmoo added area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-core labels Oct 18, 2015
@lrhn
Copy link
Member

lrhn commented Oct 18, 2015

We don't generally handle Unicode characters/glyphs in the core library. The runes getter is the only real functionality we have.
You can also do (x.runes.iterator..moveNext()).currentAsString, but it's not shorter.
If s[0] gave you the first character, then indexing isn't constant time (in UTF-16 strings). That's a bad trade-off. That's why we have runes to begin with - it's a separate object from string that treats the contents of the string as code points, not as code units. It does not have constant time lookup for the same reason - instead its iterator is bi-directional and can provide strings as well as code unit lists.

One point against Runes is that characters are often the wrong level of abstraction anyway - one should probably be using grapheme clusters - characters + combining marks because the first charactor of "e\u0301" is "e", but the first grapheme cluster is "é". We don't have support for grapheme clusters at all.

The problem with Unicode libraries is that they are basically very large tables where something innocuously looking like string.graphemeClusters.first could easily include an largish table in your program.
(I haven't actually checked how much data is needed for pure combining-mark detection, but if you want full grapheme cluster separation, you need at least some Hangul tables - see http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries).

I would personally love to have strings be sequences of code points, which is also what we started with, but it is very hard to compile efficiently to JavaScript and at the time, it was impossible to do efficiently in Dartium where all DOM strings were UTF-16.

@Hixie
Copy link
Contributor Author

Hixie commented Oct 18, 2015

If s[0] doesn't give you the first character then IMHO it shouldn't give you anything. Having a core API return a surrogate is a terrible API affordance, because it looks very much like it does the right thing in testing, then as soon as someone uses an emoji, it breaks in a way that requires substantial rewriting of all the string manipulation code. That's a separate bug, though. Filed #24619 .

FWIW, in Flutter we already have the ICU tables (we need them for rendering). In general, for embedders like Flutter, the need to compile to JS doesn't exist, and our DOM-equivalent can be whatever we want it to be. It would be a shame to restrict our API to the lowest common denominator requirements of earlier embedders...

@zoechi
Copy link
Contributor

zoechi commented Oct 18, 2015

A deviation would prevent to share code between platforms.

@Hixie
Copy link
Contributor Author

Hixie commented Oct 18, 2015

That is true. That's more an issue for #24619 than this bug, though.

@kevmoo kevmoo added type-enhancement A request for a change that isn't a bug and removed type-enhancement labels Mar 1, 2016
@Hixie
Copy link
Contributor Author

Hixie commented Mar 7, 2017

Closing in favour of #28404

@Hixie Hixie closed this as completed Mar 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-core type-enhancement A request for a change that isn't a bug
Projects
None yet
Development

No branches or pull requests

4 participants