Getting the first character of a string is way too verbose #24617

Hixie · 2015-10-17T22:44:56Z

As far as I can tell, this is how you truncate a string to its first character:

s = new String.fromCharCode(s.runes.first)

That's far too verbose. Ideally it'd be something like s = s[0] or s = s.first.

Ideally, we'd drop the entire UTF-16 encoding thing and just have String be a pure-Unicode class.

The text was updated successfully, but these errors were encountered:

lrhn · 2015-10-18T10:13:55Z

We don't generally handle Unicode characters/glyphs in the core library. The runes getter is the only real functionality we have.
You can also do (x.runes.iterator..moveNext()).currentAsString, but it's not shorter.
If s[0] gave you the first character, then indexing isn't constant time (in UTF-16 strings). That's a bad trade-off. That's why we have runes to begin with - it's a separate object from string that treats the contents of the string as code points, not as code units. It does not have constant time lookup for the same reason - instead its iterator is bi-directional and can provide strings as well as code unit lists.

One point against Runes is that characters are often the wrong level of abstraction anyway - one should probably be using grapheme clusters - characters + combining marks because the first charactor of "e\u0301" is "e", but the first grapheme cluster is "é". We don't have support for grapheme clusters at all.

The problem with Unicode libraries is that they are basically very large tables where something innocuously looking like string.graphemeClusters.first could easily include an largish table in your program.
(I haven't actually checked how much data is needed for pure combining-mark detection, but if you want full grapheme cluster separation, you need at least some Hangul tables - see http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries).

I would personally love to have strings be sequences of code points, which is also what we started with, but it is very hard to compile efficiently to JavaScript and at the time, it was impossible to do efficiently in Dartium where all DOM strings were UTF-16.

Hixie · 2015-10-18T18:36:08Z

If s[0] doesn't give you the first character then IMHO it shouldn't give you anything. Having a core API return a surrogate is a terrible API affordance, because it looks very much like it does the right thing in testing, then as soon as someone uses an emoji, it breaks in a way that requires substantial rewriting of all the string manipulation code. That's a separate bug, though. Filed #24619 .

FWIW, in Flutter we already have the ICU tables (we need them for rendering). In general, for embedders like Flutter, the need to compile to JS doesn't exist, and our DOM-equivalent can be whatever we want it to be. It would be a shame to restrict our API to the lowest common denominator requirements of earlier embedders...

zoechi · 2015-10-18T19:56:58Z

A deviation would prevent to share code between platforms.

Hixie · 2015-10-18T21:50:08Z

That is true. That's more an issue for #24619 than this bug, though.

Hixie · 2017-03-07T00:35:55Z

Closing in favour of #28404

kevmoo added area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-core labels Oct 18, 2015

lrhn added the Type-Enhancement label Oct 18, 2015

Hixie mentioned this issue Oct 20, 2015

Exposing surrogates in the string API leads to buggy code #24619

Closed

kevmoo added type-enhancement A request for a change that isn't a bug and removed type-enhancement labels Mar 1, 2016

Hixie closed this as completed Mar 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting the first character of a string is way too verbose #24617

Getting the first character of a string is way too verbose #24617

Hixie commented Oct 17, 2015

lrhn commented Oct 18, 2015

Hixie commented Oct 18, 2015

zoechi commented Oct 18, 2015

Hixie commented Oct 18, 2015

Hixie commented Mar 7, 2017

Getting the first character of a string is way too verbose #24617

Getting the first character of a string is way too verbose #24617

Comments

Hixie commented Oct 17, 2015

lrhn commented Oct 18, 2015

Hixie commented Oct 18, 2015

zoechi commented Oct 18, 2015

Hixie commented Oct 18, 2015

Hixie commented Mar 7, 2017