RFC: Grammatical Case Translations in Lingui #2517
Replies: 3 comments 2 replies
-
|
Additional context: https://github.com/unicode-org/inflection - Unicode's approach to solving language inflection programmatically (on early stage, limited number of languages) |
Beta Was this translation helpful? Give feedback.
-
|
I read it quickly, could not say that i understood the problem for 100%, but it make me think that a feature which potentially also could solve the problem described in this task was at the back of my mind for quite some time. Given: export const TAKE_A_BREAK_TITLES: Record<
AddTakeABreakRequest['type'],
MessageDescriptor
> = {
H24: msg`1 day`,
H48: msg`2 days`,
D3: msg`3 days`,
D7: msg`7 days`,
D14: msg`14 days`,
D30: msg`30 days`,
D90: msg`90 days`,
TEST_5_MIN: msg`5 minutes (Test)`,
TEST_1_MIN: msg`1 minute (Test)`,
}
const jsx = <Trans>
You understand you are about to set a Take a Break for{' '}
{ph({
period: t(TAKE_A_BREAK_TITLES[type]),
})}
. Once set, you will be logged out.
</Trans>Will generate Sometimes, the content of a placeholder ( One of the options could be just inline the whole map into the host message like so: const jsx = <Trans>
You understand you are about to set a Take a Break for
<Select value={type} H24="1 day" H48="2 days" />. Once set, you will be logged out.
</Trans>So produced message would be: Which is giving a perfect flexibility for translator to change the outer and placheolder content. However this is very unconvinient to use, if you need to reuse this map in more than one place. You had to copy-paste and then deal with the fact that in need to change this in every place and it may go out of sync. So the my idea was to implement a Static Analysis which would inline expressions if possible. // pseudocode
const TAKE_A_BREAK_TITLES = <Select value={type} H24="1 day" H48="2 days" />;
const jsx = <Trans>
You understand you are about to set a Take a Break for
{TAKE_A_BREAK_TITLES}. Once set, you will be logged out.
</Trans>Macro and extractor automatically resolves The problem is that approach is that transformer, which is usually opperates on a single file, need to read and analyze other files as well to resolve refrences from separate modules. And that could be a massive bottleneck in performance. However with a modern rust based tooling it could be less noticable. I think that feature i descrbed might partially or even fully solve the problem from this discussion. WDYT? |
Beta Was this translation helpful? Give feedback.
-
|
I re-read an RFC, it seems i understand it now. I like the idea, the following question come to my mind: How translator looking at this message only I think another macro should be implemented, to consume terms: const creeper = term("creeper") // produce a DeclensionTerm type
const sword = term("sword")
// DeclensionTerm is not directly assignable to string, so you could not just interpolate it directly, you need to use a `declension` macro
t`${player} hit a ${declension(creeper)} with a ${declension(sword)}`This would generate:
Now translator see that this term has a declension forms and could choose one. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Let translators render nouns in the correct grammatical form for their language, without asking developers to know or care about those forms. A source message references a noun once; each locale decides how to decline it.
Problem
In English a noun keeps the same shape wherever it appears - "I see a sword", "I hit with a sword". In many other languages the noun itself changes depending on its role in the sentence.
In Estonian, "sword" is
mõõkas a subject,mõõkaas a direct object,mõõgagawhen used as an instrument - fourteen forms in total. Finnish has fifteen. Polish, Czech, or Ukrainian have six or seven. German has four, plus article changes. Translators working into these languages need a way to pick the right form for each spot in a sentence.Lingui doesn't give them one. When a developer writes
the extracted message is
{player} hit a {enemy} with a {item}, and the translator receivesenemyanditemas opaque placeholders - already in one fixed form, with no way to ask for "enemy as a direct object" or "item as an instrument".selectdoesn't help either: it lets the developer branch on a value, but for declension only the translator knows which form a noun needs in their sentence.In practice, users either precompute every case variant at the call site (which blows up combinatorially), write a different source sentence for every noun/case pair, or accept phrasings that read awkwardly in the target language. This is a long-standing gap for anyone translating into Slavic, Baltic, or Finno-Ugric languages.
Proposal
1. The
termmacroA term is a noun/phrase that needs multiple grammatical forms:
Compiles to a
MessageDescriptorusing ICUselecton a_caseplaceholder:The babel plugin reuses the existing descriptor/
selectmachinery, so extraction, id hashing, and comments work the same as formsg. Users never type_caseorselectby hand.2. Usage
Extracted source:
{player} hit a {creeper} with a {sword}.3. Custom formatter registry
I18ngains a small registry;interpolate()falls back to it when a token's type isn't a built-in:Unknown types still gracefully produce the raw value, preserving current behavior.
4. Built-in
declensionformatterRegistered by
I18nitself, in the constructor. It resolves a term descriptor with_casebound to the requested case name:5. Translator view
One message per term, plus
{placeholder, declension, caseName}in sentences. Each locale declares whichever cases it needs.Term
creeper:{_case, select, other {creeper}}{_case, select, nom {kreeper} gen {kreeperi} par {kreeperit} com {kreeperiga} other {kreeper}}{_case, select, nom {ein Creeper} acc {einen Creeper} dat {einem Creeper} gen {eines Creepers} other {Creeper}}Sentences:
{player} hit a {enemy, declension, other} with a {item, declension, other}{player} lõi {enemy, declension, par} {item, declension, com}{player} hat {enemy, declension, acc} mit {item, declension, dat} getroffen{player} löi {enemy, declension, par} {item, declension, ade}Resolving Estonian with
{ player: "Steve", creeper, sword }yields: Steve lõi kreeperit mõõgaga.Why this shape
select- standard syntax parsed by Crowdin, Phrase, Lokalise, etc.registerFormatter), one new macro (term); existing messages unchanged.Caveats
selectrepurposed for cases. Case keys (gen,par, …) are a convention, unvalidated by ICU or the parser. A typo in{enemy, declension, pra}silently falls through toother. Likely mitigation:lingui compilelint for unknown keys per locale.{x, declension, case}triggers a nestedi18n._(). Negligible individually, non-zero in loops.plural(fixed CLDR categories), case keys have no authoritative list. Translators need to know their own language's keys; we'd likely publish a recommended per-locale set in docs.Open questions
Hello, {userName}!?Beta Was this translation helpful? Give feedback.
All reactions