Skip to content

Latest commit

 

History

History
375 lines (198 loc) · 31.5 KB

File metadata and controls

375 lines (198 loc) · 31.5 KB

2023-08-03 ECMA-402 Meeting

Logistics

Attendees

  • Shane Carr - Google i18n (SFC), Co-Moderator
  • Ben Allen - Igalia (BAN)
  • Eemeli Aro - Mozilla (EAO)
  • Frank Yung-Fong Tang - Google i18n, V8 (FYT)
  • Myles C. Maxfield - Apple (MCM)
  • Richard Gibson - OpenJS Foundation (RGN)
  • Sean Burke - Mozilla (SBE)
  • Yusuke Suzuki - Apple (YSZ)
  • Zibi Braniecki - Mozilla (ZB)

Standing items

Status Updates

Editor's Update

RGN: Have been away most of the month. Some small things landed.

BAN: Recently merged: well-known intrinsics iterator, etc., were added to the table, adhering to 262 style, and a few normative PRs that are up. Some fairly small re-wording "increase x by y" instead of "set x to x + y". Removed some unnecessary double negatives.

MessageFormat Working Group

EAO: We've been aiming to have a spec released by this fall's ICU release; will probably slip, so we'll try for next spring. Working through all the remaining issues. It's an actively continuing discussion. I'm feeling optimistic about the rate at which we are progressing.

Proposal Status Changes

https://github.com/tc39/ecma402/wiki/Proposal-and-PR-Progress-Tracking

FYT: improvements to layout of table on proposal tracker. Added info related to recently landed changes in V8. Request for update from SM and JSC on recent PRs. Plan to archive older PRs.

SFC: Looking at this table, it seems like we’ve got pretty good test coverage on the recent PRs. Some MDN changes waiting. Test262 is step 1 – super important to have test coverage for all the normative PRs that merge

Pull Requests

Normative: Consistently remove u- extensions from values in Intl object [[Locale]] slots #817

#817

BAN: (introduces PR)

SFC: It does seem useful to have the extension keywords – I’m not sure if removing the tags from the locale is the right direction – we could go the other way. Then we can have the locale identifier and the options bag consistent with each other. I also worry that if we start removing this information it may not be completely web-compatible. Adding is easier than taking them back out, especially because the options for setting these are not that old.

ZB: I’m glad that someone noticed the inconsistency, but I’m concerned about making locale lossy. The way I try to think of locale in this setting is that the signal we’re getting from the client is that they’re reading it directly or having it provided elsewhere. If they provide it explicitly, there’s value in maintaining it. The reason why that’s the case is that there’s nothing set in stone about the hour cycle being h23 in English – it could change in the future. If we’re trying to decouple these values, it makes to sense to say. It seems like your PR is making it lossy. I would say that if it were listed in either the string or the options bag, then keep it. If not listed, of course not. Or even that if you set in the extensions and then unset it in the options.

FYT: I don’t think we can use undefined as a value. That’s equal to “we don’t have that option”

ZB: There’s a difference between a key where the value is undefined and an undefined key.

EAO: I agree that we should not make a key with an undefined value mean a different thing than the key not being set. That’s counter to much of what we’re trying to do with the JavaScript spec in general.

ZB: For this PR, gain more consistent and explicit behavior is great, I side with SFC that I don’t think removing it is the optimal solution. I see the locale field value, the value of that is not just human readability but actual ability to take it and carry it forward. If not, then why do we have this field there? If what you’re carrying forward is not a stringified version, we should have getLocale(. The only round-trip solution is to take the locale string and construct a new locale out of it.

SFC: We do have a way to, if you want to unset the hour cycle, that’s what we have the Intl.Locale APIs for. I think doing hourCycle: undefined has a difference from the absence of an hourCycle field. What I would expect the behaviour to be in these cases is that rows 1 and 2 are correct, 3 and 4 should both have h23, that way the tag hour cycle and the options hour cycle are the same.

EAO: on the third one I agree that it should be en-u-hc-h23, but the fourth one should just be en. This is a template for the fields that the user expected to be there, and then for the resulting locale to include or not include these fields in the result.

FYT: My question is what the motivations to make a change here? Are they strong enough to deserve a change?

SFC: This looks like the original issue, FYT, it looks like you filed it in 2019? But are the motivations still strong enough?

SFC: I would go on to say that the motivation that when we added the explicit options in the option bag we never considered this question of what to do with the locale string. I see this as an extension of that previous proposal that added the hourCycle fields that can be set in the options bag, filling in a gap for something that we didn’t recognized at the time as being an issue.

Four options:

Remove the extension keywords Add all the extension keywords Leave it the way it is Only have the extension keywords if they were already present in the locale.

I lean toward option 2, where we add all of the flags all the time, which means the options bag and locale string are consistent with each other. I’m also okay with leaving the weird status quo for this edge case, but I don’t necessarily think that the motivation that this is confusing for users is fairly strong, I don’t know if it rises to the level of making a normative change. The standard for that should be “this is a behaviour we should have had when we introduced the field a few years ago, but didn’t think to.”

FYT: When [littledan] said that we should only do that when we have all of the options defined, why is that and why do we need this criterion? In the collation there are certain collations that were defined in the locale but that is not settable in options bags. Would that complicate things? -co-search, that sort of thing. What will happen if the user of the locale has en-u-co-standard, which we can’t read from the options bag. Should we take it out??

SFC: I hadn’t consider this when I mentioned favoring option 2: if we’re in a situation where the locale keywords and the options keywords are not always the same set, then we can’t always have the invariant that the locale and the options bag keywords are consistent with each other. Maybe the status quo is fine? “If we read the field value from the locale, keep the locale value the same, but if we changed it in the options, you have to read it from the option bag because we may not be able to put it back in the locale tag for you. It’s still accessible in the resolvedOptions() if you need it.

BAN: proposes adding a note explaining the behaviour?

EAO: One approach is to consider that we have two different ways that users may be giving the options to us. Do we have a preference between them? The options bag approach that the Intl constructors have are telling the story that this is the way that things are supposed to go. Are we making a explicit preference for one or the other? Like, “you can do it this way, you can do it that way, but this way is better” – and in this case, it’s the options bag that’s better.

RGN: We’ve very clearly sending a preference, because any option that’s set in both the string and the bag, the bag wins.

EAO: By that logic, I would be happy with anything that’s in the bag getting removed from the string.

SFC: I still maintain my webcompat concerns. I’m fine with a solution that involves adding more clear documentation about the behaviour here. It’s a weird corner of the API, but it’s fine – there’s a lot of things in this API that are a lot weirder.

FYT: I propose no change – I don’t see a strong case to solve people’s problem and there’s a small risk for webcompat issue and extra work for the developers. With no strong reason to make change, there’s a strong reason to not make a change.

EAO: +1 to that.

RGN: That makes sense, and I really like BAN’s idea of capturing the reasoning in a note. If nothing else, it will be relevant if this should come up again. Decision for now to not change behavior, and also to document the behaviour and its motivation.

SFC: +1

Conclusion

Keep current behavior, but add documentation in the spec.

Locale Extensions?

BAN: Integrating feedback from MCM.

SFC: Hope to see a concrete presentation/update next month.

Normative: Fix order of rounding* option reads and resolvedOptions() #811

#811

FYC: [missed part]. The problem is that even after #768, the reading order for rounding are grouped together, but they’re not alphabetical. There’s no reason to move them together and not sort them – if we care about order, let’s care about order. But when I went to make this I realized we have a bug in #768 – the AO is used by Intl.PluralRules, but PluralRules doesn’t have the required internal slot. I don’t think it’s implementable as stands, because the AO will not be able to access a field that PluralRules doesn’t have.

There are two things going on here: one, the editorial change to make the PluralRules work, the second is ordering the result options and options reading of those three to be the correct order, same between PluralRules and NumberFormat, to have consistency. That’s this PR.

RGN: I have not looked at the bug you’re describing, I trust you on that. I have a preference for putting that in a separate PR. We know from hard work in the Temporal proposal that there is a preferred order: on reading, copy all the properties in iterator order, and then afterward it’s all unobservable so you don’t have to worry about order concerns. On output, alphabetical order or logical, if there’s a clear logical order. But copying everything up front and then keeping order unobservable after that is the best strategy.

FYT: I don’t see what you’re saying.

RGN: The problem in this PR is that we don’t follow the best practice of reading the options all at once. If we did that, like Temporal does, this problem would never come up. It’s a clear best practice for the specification, but it would be normative to adopt it. If normative changes are on the table I am for adopting that best practice – not fiddling with the order of specific things, but going all the way to reading all the options up front.

SFC: My feeling here is that I agree that long-term we should work toward the world that RGN describes, but also that it’s okay to land smaller incremental changes that get us in that direction. This is a fine small change on a relatively recently landed proposal, so it’s fine from my perspective. However, I agree that for any new proposal we should do what Temporal does. I don’t think it’s worth the disruption of an across-the-board change to the existing APIs, though. It’s fine to have small incremental approvals.

RGN: In this case it looks fine to me, but I would still like to see this PR split between the actual fix and the normative change.

FYT: Let’s say I want to split. I need to specify how the plural rules should output result options. What should I put on there? Currently, in order to make that change I need to change that table. If I only change the algorithm to make the editorial change, the resolvedOptions() output of the plural format will be inconsistent with the order of NumberFOrmat.

RGN: I’m willing to adopt this if it’s not possible to split in the way that I want. If I do find a way I’ll structure it such that we end up in the same place.

FYT: Originally my intention was to be a normative change, but I found out that there’s an editorial issue here. In a sense the editorial part is the attachment to this normative thing. If the way to fix it for the editorial part, no problem, but that’s not my motivation.

RGN: If I can split it, I will, and if not it’ll have whatever structure this one has in terms of commits. I think there’s still an open question of if we’re doing it or not – the rest of it is editorial management. But “accept or reject” is on the table for this group.

FYT: This PR, from the number formatting side, changes the reading order from roundingPriority, roundingIncrement, and roundingMode to roundingIncrement, roundingMode, rounding Priority.

RGN: I applaud your ambition and it works for me.

SFC: What’s the conclusion that we should record for this?

RGN: Assuming no objections, we adopt it and I assume the responsibility of separating the followup fix from the observable change.

SFC: Does that work for you?

FYT: Can we add this to the agenda for the plenary?

SFC: All of these normative changes have to go on the agenda.

FYT: But do we agree to go there?

RGN: I’m in favour.

SFC: +1. Is the PR to bring to plenary the one that’s already been written, or a new one?

RGN: It should be this one. I assume FYT has the allow maintainer edits. Any rewriting can take place in the context of this PR.

Conclusion

RGN and FYT will work together on changes to the PR, which we will then bring to plenary.

Normative: BestAvailableLocale AO now removes extensions before setting initial value of candidate locale. #807

#807

BAN: (introduces PR) BestAvailableLocale did not handle t and x extensions.

FYT: This is a normative change, but how are we going to test it?

BAN: That’s the hard question – ICU doesn’t use it.

FYT: This isn’t a general exposable thing that if you ask a question it’ll give it to us, it depends on what the available locale is. Whatever comes out is limited to what the system supports. In theory we could have some T extensions, but practically I don’t think anyone ships T extension data. This is a theoretical case, since one part of the sets that need to be matched are not including the whole thing, at least not what we’re currently aware about. Theoretically, yes, practically, I don’t believe it impacts anything, therefore it’s difficult to test.

RGN: I haven’t reviewed the spec text, but the description makes sense and I am in principle in support of this change. In terms of testability, if anyone wants to expose an interface by which the set of locales could be manipulated by command line flags, that opens up an avenue to test it in test262, but without that it’s not going to be testable. I’m alright with just having scrutiny on the text itself. Engine262 would also be a possibility, though I don’t think they have enough of the infrastructure to be at the point of testing this yet.

SFC: Another question: what is the motivation for including the -x- and -t- in locale lookup anyway?

RGN: If an implementation did, discarding that is a disadvantage.

SFC: Not sure about -t- extensions, it doesn’t necessarily make sense that engines would want to do negotiation with the T and U extensions, though possibly the X extensions.

FYT: I am supporting this PR, but I think this is non-observable. This should be an editorial PR.

RGN: I’m certain it’s observable, but none of the current implementations do.

FYT: So not observable.

RGN: But that’s not what it means – it means that any conforming implementation will behave the same after the change.

FYT: How do you know that they’re the same or different?

RGN: You never know for sure. The ECMA262 mathematical operations are similar – implementations do not explore the full space of conforming behaviour, but that doesn’t mean those changes don’t count as observable.

FYT: Are they normative?

RGN: yes. Or rather, anything normative is observable. “Observable as used” is something that relates to any conceivable conforming implementation.

SFC: It’s exposed indirectly in a lot of places – but if it’s not directly exposed to anyone, it’s not an observable change.

FYT: This depends on a set of things that is not controllable. My understanding of observable could be wrong, but if we can’t observe it I don’t know why we’d mark it as normative.

RGN: It’s just bookkeeping. Imagine that there’s an implementation out there that we don’t know about, that is conforming, this spec change could push them out of conformance, therefore it’s normative.

FYT: How?

SFC: Let’s treat this as normative just to be safe.

BAN: The problem is that the current state of the spec is buggy – if a hypothetical implementation tries to use -t- extensions, they’ll have invalid lookups.

RGN: Strongly agree with that; with the current spec text being buggy, I'm in support of a change. I need to review the details of this one.

FYT: Looking at the PR, line 90 strips the Unicode locale extensions, and now you're removing it? Are you going to add it back?

BAN: In practice, everything using this AO strips out the Unicode extensions anyway.

SFC: Let’s bring the detailed discussion of the spec text offline, it seems like we’re aligned on the spirit of it. Let’s bring it back up for discussion next month to bring it to the next plenary. If there are substantive updates, we should bring it to the group, otherwise we can say we’ll approve this PR to take to plenary aside from the feedback on the exact wording.

FYT: I have some concerns. I would like to make it explicit what the use case for this PR is. Is it to support any additional extension other than U extensions

SFC: We should list the motivation, but as I see it is that if implementation use these extensions we should allow implementations of the specification to use them correctly as part of the negotiation algorithm. This is a bug.

FYT: I have a problem here. Here it is: if we look at the big picture, the larger context of this AO’s use, before you call this thing we pass in certain relevant keys for the operation. Everything related to the relevant extension keys are dealt with, everything that’s not is ignored or stripped out. So, if we consider that, anything that not in the U extension is also not a relevant key, and should be treated the same way as any other U extension key not listed in the relevant keys. This isn’t consistent with the larger context of why we have [[RelevantKeys]] – we should strip out everything not in [[RelevantKeys]] and not a U extension.

RGN: I disagree. There’s lots of implementation and locale-specific behaviour that’s escape hatches, and X extensions would be relevant for that. They can’t deviate based on relevant extension keys, but relevant extension keys on their own don’t specify behaviour.

FYT: Let me make an example: in U extensions there’s u-rg. Regional override specifies supposed behaviour, but currently it’s not in relevant keywords, so we ignore it. We strip it out, even if it’s in there. By the same meaning, all these other extensions should go.

RGN: T is specified in UTS 35 in a way that might preclude that. With an X extension you could, it’s private use and it’s wide open. Anything that the ECMA402 specification leaves up to an implementation, an implementation can factor in whatever they want. This is a justification for X extensions existing in the first place, and requiring an ECMA402 implementation to ignore them wouldn’t be appropriate.

SFC: I agree with what RGN says here. The subdivision subtag is something that could hypothetically be used for negotiation, or an X extension supporting some feature of the locale that’s not yet supported in unicode locale identifiers, those are the types of things that private use subtags are for. They should be allowed to be used as part of negotiation – as an Intl spec it’s our responsibility to allow that.

FYT: My point is that currently we in the bigger picture say that if we pass in a locale, language country or region and script are used for negotiation, and anything in U extension are not used for negotiation unless they are specifically stated in relevant keywords. The T and X keywords are forbidden unless they’re allowed to [[RelevantKey]]. Keeping this in mind, how could we allow them in this context? They should be treated by the relevant key, because we have an inclusive list for negotiation for U extensions. We should also have an inclusive framework for those things. I’m not opposing the change, I do oppose that we treat -t- and -x- extensions as higher priority than [[RelevantExtensionKeys]].

RGN: If you see (say) an nu- subtag that specifies a numbering system, that’s something we expect ECMA402 to pay attention to. That is not true and it cannot be true of an -x- extension, because -x- is private use. To have an implementation pay attention to any -x- tag that might appear, that’s fine, but we can’t have them add something that.

FYT: If it’s private use, it should be private use.

RGN: Yes, but that requires that the implementation be allowed to pay attention to it. If the specification requires the implementation ignore it, it is no longer available for private use

BAN: My first thought is that there's a lot of wording ambiguity in the spec. There's not a term for referring to -t- or -x- extensions, but -u- extensions has its own term. Preferences, in order: Keep the extensions, require implementations to ignore them, and then finally the current situation where we allow implementations to consider them but provide a buggy algorithm for doing so.

RGN: I agree with that order of preferences.

SFC: I have a strong preferences for allowing implementations to use them and not require that they ignore them. But if we can’t reach consensus on that then I don’t know how I’d rank between two and three on those options that BAN listed out. I don’t like requiring implementations to ignore private use extensions, that’s not the direction we should take.

SFC: T has a very specific semantics that’s not used for negotiation generally, but could be. If you wanted your locale to be Zawgyi or Chinese Pinyin or Hinglish, all of those use T extensions.

BAN: Sometimes T extensions are use in Japanese to mark that text translated from other languages be rendered in katakana.

FYT: It’s a pretty big change to say that unicode extensions are exclusive and others inclusive.

RGN: Right now it’s not strictly limited, because the algorithm is intending to handle them, but is doing it wrong.

FYT: I understand that the current algorithm has bugs, I’m saying that the [[RelevantKey]] thing we’re not taking anything in U extensions unless they’re listed here.

EAO: I understand perfectly well why we would want to have different practices for U vs T and X. U is a scope where later we might want to define things, with X we’re saying we’re never going to step in this space.

FYT: I agree with what you said: if we take that as RGN suggests, we’re taking the stance that we’ll never touch T. I don’t want to close that door.

SFC: I don’t necessarily understand that allowing the extensions to be used in negotiation closes the door for us to use them in other ways in the future.

EAO: The way I see it, if we allow an implementation to establish a meaning for a T extension and we establish a different meaning for that T extension, this is problematic. This is what I understand RGN to have said a while ago on this.

SFC: I think if were were to use the T extension it would be for an Intl.Transliterator context, which is where the T extension could have meaning. This doesn’t mean that the T extension can’t be used for negotiation elsewhere. If an engine is providing Hinglish or Chinese Pinyin data, that data should be available to Intl.DateTimeFormat even if Intl.Transliterator used it for something different. And this is what’s going on with this Unicode locale extensions, we’re ignoring them for the purpose of negotiation. We’re spending like an hour now discussing a thing that we can’t tell is observable. This gives engines flexibility in the future – I don’t think any decision we make now blocks any other doors in the future. SFC preferences:

Allow use of T and X Leave it the way it is Remove them

BAN: In any case, we should document this.

SFC: How about both documenting it and also fixing the algorithm to allow their use.

FYT: What does “fixing” mean?

SFC: Let’s look at the original conversation. It looks like the team advised

FYT: My point is that the resolution for this one was to take that out.

SFC: My read was that he was saying what his team had decided to do, but that he wanted clarification from us.

FYT: Answering them is the solution.

RGN: An answer can take the form of “we have made the spec something unambiguously clear.”

FYT: Spec text is not usually the best answer to a question.

RGN: Would you consent to extensions to V8 to make this testable?

FYT: No. You’re saying the reply is a spec change, I’m saying without unit tests

SFC: what are the points of contention? The only one I can identify that’s clear is this concern about whether making a change here closes the door to consume T and X extensions in a different way in the future, FYT was saying yes it does, I say it doesn’t. So if we can agree that this doesn’t close the door – if we were to agree that making this change would definitely close the door on using T and X in the future, is that a reason to adopt or not adopt this behaviour?

RGN: For me I say reservedly yes, but there’s the question of what constitutes a change. Making it clear that implementations are allowed to factor in T and X subsequences constitutes a change, given that they’re mentioned in the algorithms. If we don’t know where we are right now, we do not agree on what is allowed and prohibited, it’s more difficult to assess the proposed new spec texts because we’ll have different interpretations of its consequences.

SFC: What is allowed right now is that the spec allows and requires the resolution of these extensions to get into invalid forms – requires they be used, and requires that they be used in a buggy way.

RGN: Does anyone disagree? If I understand FYT correctly, I hear you saying that you do disagree– that implementations aren’t allowed to factor them in right now. Looking at the current state of the spec, SFC has asserted and I agree that implementations are required to pay attention to all subtags, including X and T sequences.

FYT: I don’t get it, why are they require to pay attention?

RGN: There’s nothing to strip them out right now. It’s in a loop that strips off the ending subtags from the end. So if the input includes an -x- subtag sequence, then the algorithm requires the implementation to check them against available locales.

FYT: So you’re saying if we pass an X extension, or a T extension

RGN: As currently specified the algorithm requires paying attention to both T and X.

FYT: And we are going to take them off one by one and match whatever is in available locales. And these only behave differently if we have those partially stripped-off locales.

RGN: availableLocales could contain en, en-us, en-gb. Do you agree that the current algorithm text requires implementations to pay attention to an -x- subtag sequence?

FYT: Yes

RGN: Therefore it would be a change if we required implementations to ignore that subtag sequence.

FYT: Correct.

RGN: Popping up the stack, it sounds like we have agreement on the current state of the specification. Implementations are required to pay attention, and implementations are allowed to use them. In light of the current state of this matter, we agree that it would be a change to require them to ignore it, and the question is that if we required them to ignore it would that close the door on future use?

SFC: I believe that what i previously stated is that if we should do, and what we currently do in a buggy way, is allow these extensions to be used (in a buggy way) as part of negotiation.

RGN: Is there consensus that continuing to allow them to use it does not close any doors?

SFC: Current situation is that the X and T extensions are used in a buggy way. I think the state that we’d like it to get to is one where implementations are allowed to use an implementation to find an algorithm as to how to match X and T extensions. This strip last subtag algorithm isn’t the best way for this.

RGN: Yes, I think this strip last tag algorithm makes sense within the language identifier, but not the broader locale identifier.

BAN: Would it be reasonable to not dictate the algorithm? Since only specific implementations that we don’t know about are even potentially using it, what if we say the algorithm for using them should be implementation defined?

SFC: I hate this algorithm – I kind of want to make it all implementation-defined. Change the algorithm to say that this function will return an element of the availableLocales list, which entry it returns is implementation-defined or it can return undefined.

RGN: If we did that, would there be any difference between LookupMatcher and BestFitMatcher?

SFC: That’s a good question. LookupMatcher is designed to be an “algorithmic algorithm,” not implementation-dependent.

EAO: I’m not sure what the spec says about this, but the place to find out what it does is the MDN page on it. MDN says “this is the matching algorithm from BCP 47, best fit should be at least as good as that.”

SFC: This is used by LookupMatcher

FYT: It’s used by both.

RGN: Is anyone familiar with the algorithm referenced by LookupMatcher. It looks like our hands are tied – we’re going to follow the RFC 4647 3.4 algorithm for LookupMatcher. It doesn’t leave a lot open – the implementations are open on BestFitMatcher, but for LookupMatcher they have to do just this. This technically does factor in the X and T sequences, but not necessarily in a useful way. For that reason, it’s unlikely to come up in implementations, even the hypothetical ones we’re discussing. But because it is so simple, it would be possible for us to write tests.

EAO: Because of the specific example given here, for the BestFitMatcher it ties our hands to need to look at the X extensions as well and care about what’s in there, in order to be at least as good as LookupMatcher.

RGN: I can agree with that, but BestFit is already implementation defined, at most we could “encourage” that implementors not ignore X extensions. We wouldn’t require an implementation to ignore them, but we won’t get precise about how they affect results.

FYT: We can have tlang, this can have en-latn-gb, you can cut off gb, you can cut off latn, it’s still valid.

RGN: Because availableLocales has content that is canonicalized, at any given step if it matches it matches against something that’s already canonicalized. The output is guaranteed to be canonicalized if not empty. Even in the current form.

RGN: I think that there is no need to change the current algorithm.

FYT: Anything that’s not canonicalized cannot be on the left-hand side.

Conclusion

The spec follows RFC 4647 already. We should continue using that algorithm. Improve docs to be clear that it is not possible to return a non-canonical locale ID, even though they could exist transiently as part of the AO.

SFC: Do we agree on this conclusion?

BAN: +1

RGN: +1

FYT: +1

EAO: +1