Skip to content

Internationalization (i18n) of human-readable TD fields #161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mkovatsc opened this issue Jul 4, 2018 · 23 comments
Closed

Internationalization (i18n) of human-readable TD fields #161

mkovatsc opened this issue Jul 4, 2018 · 23 comments
Labels
Needs review Issue was fixed, but is still open for post-merge reviews

Comments

@mkovatsc
Copy link
Contributor

mkovatsc commented Jul 4, 2018

From the discussion at the Bundang F2F Meeting:

We should:

  • Allow for a single language string with content negotiation (Accept-Language)
  • Allow for a multi-language container similar to title* of Web Linking

To consider:

  • TD serializers require the TD model to support multiple languages to serialize a specific language on demand

Way forward:

@benfrancis
Copy link
Member

See also: w3c/wot#373

I would recommend using the same approach as the W3C Web App Manifest specification, which is to have a Thing Description use a single language at a time, with that language specified by a language member.

A specific language can then be requested using content negotiation with an Accept-Language header (as you suggest above) or geotargeting.

Including potentially hundreds of different languages in a single Thing Description could make the Thing Description very large, which is particularly problematic for embedded applications.

@ToruKawaguchi
Copy link

I would prefer keeping multi-language container approach as an option,
to allow omitting negotiation mechanism implementation,
when only a few languages are enough (e.g. JP and EN).

@takuki
Copy link
Contributor

takuki commented Aug 22, 2018

The approach mentioned by @benfrancis does not seem to prohibit having multiple content for various languages together in storage. TD with multiple language content can be stored on the server, and using the same approach described, can be filtered before being sent to the client.

@sebastiankb
Copy link
Contributor

Please lets consider a TD sample using the @language map variant from JSON-LD. Mainly, in the TD we would most likely apply the multiple languages support for the terms description and label. I assume the context definition can also be hidden in the TD context file.

{
    "@context": ["http://www.w3.org/ns/td", 
    		{"description": { "@id": "td:description", "@container": "@language" } },
    		{"label": { "@id": "td:label", "@container": "@language" } }],
    "@type" : "Thing",
    "id": "urn:dev:wot:com:example:servient:lamp",
    "name": "MyLampThing",
    "description" : {
        "en" : "MyLampThing uses JSON-LD 1.1 serialization",
        "ja" : "MyLampThingはJSON-LD 1.1シリアル化を使用します",
        "de" : "MyLampThing verwendet JSON-LD 1.1 Serializierung"    
    },
    "properties": {
        "status": {
            "description" : {
                "en": "Shows the current status of the lamp (On/Off/Error).",
                "ja" : "ランプの現在の状態(オン/オフ/エラー)を表示します。",
                "de" : "Zeigt den aktuellen Lampenstatus an (An/Aus/Fehler)"
            },
            "label" : {
                "en": "Lamp status",
                "ja" : "ランプの状態",
                "de" : "Lampenstatus"
            },
            "writable": false,
            "observable": false,
            "type": "string",
            "forms": [{
                "href": "coaps://mylamp.example.com/status",
                "mediaType": "application/json"
            }]
        }
    }   
}

Please comment on this approach.

@danielpeintner
Copy link
Contributor

What I often see in i18n solutions is a way to specify a default language. Let's say what happens if the current language is neither, "en", nor "ja" or "de".
Do we have a way to indicate a default value or is it up to the client which language is going to be picked?

Moreover, if I take a look at the JSON-LD reference you shared there are different formats also.

  "occupation": "忍者",
  "occupation_en": "Ninja",
  "occupation_cs": "Nindža",

Nested translations compared to translations on the same level.

The latter approach also answers my first question/comment given that a tag without a language postfix such as "occupation": "忍者" would be the default.

I assume both variants are possible?

@takuki
Copy link
Contributor

takuki commented Sep 18, 2018

There is an example in JSON-LD 1.1 draft specification. It appears that a special key "@none" can be used to specify a default representation in a language map. Here is an excerpt.

"label": {
"en": "The Queen",
"de": [ "Die Königin", "Ihre Majestät" ],
"@none": "The Queen"
}

@sebastiankb sebastiankb added the Needs discussion more discussion is needed before getting to a solution label Oct 19, 2018
@sebastiankb
Copy link
Contributor

sebastiankb commented Nov 16, 2018

there need a review process by the i18n group. We will ask when we have the CR.

@sebastiankb
Copy link
Contributor

Outcome of today's TD meeting:

  • follow the comment from Taki
  • retyping of the description and title term from string to string and object
  • check if name shall also support object --> needs a more clear definition about name (@mkovatsc setup an strawman)

@sebastiankb sebastiankb added the Work In Progress Issue is being taken care of label Nov 16, 2018
@mkovatsc
Copy link
Contributor Author

JSON Schema says in https://tools.ietf.org/html/draft-handrews-json-schema-validation-01#section-10.1 that title and description "MUST be a string". This is not a technical constraint, but usability issue when there are two conflicting definitions of these terms (TD vs JSON Schema).

@handrews, is your assumption that JSON Schemas will always use HTTP language negotiation (Accept-/Content-Language) or a proactive language negotiation, or have you not thought deeper about internationalization yet? :)

While we could go down the path of language negotiation, simplifying TDs, @ToruKawaguchi advocated the uses case to allow for (a small number of) different languages directly in one TD document (which significantly reduces overhead, e.g., new TD as to be fetched when UI changes display language). The langauge-tag indexed object is a simple enough mechanism for this.

@handrews Is there a possiblity that you might loosen the constraint from Sec. 10.1?

An alternative could be similar to Web Linking, where there are two attributes (title and title*), with the second one allowing multi-language.

@sebastiankb sebastiankb added Needs discussion more discussion is needed before getting to a solution and removed Needs discussion more discussion is needed before getting to a solution Work In Progress Issue is being taken care of labels Nov 22, 2018
@sebastiankb
Copy link
Contributor

I put this on tomorrow's agenda.

@sebastiankb
Copy link
Contributor

We should also consider the aspect that the title and description is not used for validation. So, it is maybe ok to allow a object type for both names.

@handrews
Copy link

@mkovatsc we have definitely thought about I18N/L10N, but not thoroughly. We definitely are not relying exclusively on HTTP content negotiation as not all JSON Schemas (or even all JSON Hyper-Schemas) are accessed over HTTP.

However, we generally lean more towards a solution that is orthogonal to individual fields, which is why title* does not exist even in Hyper-Schema, which is otherwise explicitly an RFC 8288 link serialization. So HTTP content negotiation would be a solution.

This topic gets mentioned at json-schema-org/json-schema-spec#114, but not really discussed, and I'm having trouble finding where else this has been covered. The mention of a UI schema vocabulary in that issue is talking about using a schema to hint as to how to display the content under different languages, for example within a web page that has detected the language already through the browser's locale. That idea hasn't yet been developed further.

I think the other idea that has been floated is some sort of external I18N/L10N framework that would replace tokens as a pre-processing step, as is often done with web framework templates.

@handrews
Copy link

@sebastiankb you'd fail meta-schema validation, though, if anyone tried to apply that.

With the formalization of vocabularies and extensions in draft-08 (the $vocabulary PR is finally up and under discussion!), it would be better to create new keywords like titleObject, or localizedTitles or something than to re-define existing keywords in an incompatible way.

Part of the other work in draft-08 involves formalizing how to collect and use annotation values like title, so while you would not break instance validation, any tools that try to use title in a standard way based on its annotation collection rules would break.

@mkovatsc
Copy link
Contributor Author

The consensus appears to be to keep title and description as string and make them follow any possible language negotiation (e.g., HTTP Accept-Language header) and to use separate terms for the multi-language containers.

As also the JSON-LD talks about a "programmatically easy way to navigate the data structures for the language-specific data", I would prefer the language map ({ "en": "...", "jp": "..." }) over the language-specific terms ("title_jp": "...").

title and description would be the default language (similar to JSON-LD's @none entry in the language map, which also follows negotiation.

Candidates for the language map fields/members, on which we still have to decide:

  • titleObject and descriptionObject
  • titleMap and descriptionMap
  • titles and descriptions

@ToruKawaguchi
Copy link

I like titles and descriptions

@sebastiankb
Copy link
Contributor

ok, I will compile this approach into the TD spec

@sebastiankb sebastiankb added Work In Progress Issue is being taken care of and removed Needs discussion more discussion is needed before getting to a solution labels Dec 10, 2018
@sebastiankb
Copy link
Contributor

as reference for the language code I will link to iso639-2 https://www.loc.gov/standards/iso639-2/php/code_list.php

@sebastiankb
Copy link
Contributor

well, I just realized that ISO 639-1 is more common. JSON-LD seems also to rely on that.

@sebastiankb
Copy link
Contributor

commit a25e3f9 and ad339bb introduce the multi-language support in the TD.

Examples can be seen in Example 5, 6, 7, and 8.

Please review.

@sebastiankb sebastiankb added Needs review Issue was fixed, but is still open for post-merge reviews and removed Work In Progress Issue is being taken care of labels Dec 10, 2018
@takuki
Copy link
Contributor

takuki commented Dec 13, 2018

The changes look very nice!

@sebastiankb sebastiankb added Work In Progress Issue is being taken care of and removed Needs review Issue was fixed, but is still open for post-merge reviews labels Dec 14, 2018
@mlagally
Copy link
Contributor

Based on the discussion in the TD call on 14.12.it should be considered to use the encoding defined by
https://tools.ietf.org/html/rfc5646

@takuki
Copy link
Contributor

takuki commented Dec 17, 2018

A W3C Working Group Note "Authoring HTML: Language declarations" suggests to make a reference to BCP 47 which in turn references RFC 5646.

@sebastiankb
Copy link
Contributor

thanks for the update and integration.
I think, this issue is solved and needs final review.

@sebastiankb sebastiankb added Needs review Issue was fixed, but is still open for post-merge reviews and removed Work In Progress Issue is being taken care of labels Dec 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs review Issue was fixed, but is still open for post-merge reviews
Projects
None yet
Development

No branches or pull requests

8 participants