Make processing of embedded HTML normative #57

gkellogg · 2018-08-23T20:35:41Z

Currently, Embedding JSON-LD in HTML Documents is entirely informative. We've discussed making this normative, requiring JSON-LD processors to be able to identify and extract JSON-LD from a script tag with type application/ld+json within the HTML document.

Given multiple such script tags, which one is used?
Should we define a parameterized content-type to allow the version to be specified (e.g., application/ld+json;version=1.1)
Does the current document base affect the base for JSON-LD processing?
- location of HTML document
- html>head>base@href
xml:base of closest ancestor element
Does the document language affect the default language for JSON-LD processing?
- HTTP header- Content-Language
- @lang, @xml:lang

The text was updated successfully, but these errors were encountered:

iherman · 2018-08-24T04:58:27Z

I believe we must do that. Embedded JSON-LD is, currently, the only way schema.org data is used, and there are a number of application (e.g., Web Publications) where this is the sensible way to go.

My take on the the questions:

Given multiple such script tags, which one is used?

I see two options:

Take the first script element in tree order
Take all of them and merge the resulting graphs in the RDF sense

Both approaches provide a clear specification; I am more favor of No. 1

Should we define a parameterized content-type to allow the version to be specified (e.g., application/ld+json;version=1.1)

If we define that for HTTP, then I guess it is necessary to follow that, yes.

Does the current document base affect the base for JSON-LD processing?

location of HTML document

html>head>base@href

xml:base of closest ancestor element

I do not think we should go there. Per HTML spec, the <script> element's DOM Note has a baseURI property, whose exact specification is in the hands of the HTML spec. We ought just to take that one.

Does the document language affect the default language for JSON-LD processing?

HTTP header- Content-Language

@lang, @xml:lang

Yes. I believe if, for example, somebody uses <script ... lang="fr"> (which is a perfectly valid HTML statement), we ought to use that. So again, whatever is valid for the script element as a node in HTML should be valid for the content of script.

ajs6f · 2018-08-24T15:17:59Z

@iherman I have to disagree a bit about which approach to take for multiple <script/>s. As a consumer of JSON-LD, I would find it surprising that I could "read" all these assertions in the document (every <script/>), but that only some would be read by machinery (those in the first wrt document order).

But I can also imagine situations in which (e.g. via CMS action) many <script/> elements wind up in a document with no real provenance, but I can clearly identify the one or few of interest to me.

So if we do make processing JSON-LD in HTML normative, do we need to offer a mechanism by which one or more (up to all) <script/>s can be selected from a document at processing time?

gkellogg · 2018-08-24T15:55:55Z

I agree that if an HTML has multiple script elements that they should all be considered and merged into a common dataset. My own RDFa processor looks for any script element with a type attribute associated with an RDF reader, along with Microdata and RDF/XML and extracts triples from all.

The issue about choosing among script tags was surfaced for the use case where the context references an HTML document with embedded JSON-LD script(s). In this case, which one would be used as the context, or would they all be used?

ajs6f · 2018-08-24T16:09:46Z

In this case, which one would be used as the context, or would they all be used?

Just off the top of my head, I would be a bit worried about a merge in that situation because at least one of those <script/>s might contain a context meant for use with metadata for the page itself (e.g. publishing info, etc.). Perhaps we can offer a syntactic form that prioritizes sources within some larger context?

iherman · 2018-08-24T16:25:48Z

@ajs6f I do sympathize with accepting several scripts, but I am not sure we have a clear story on how we would merge several JSON-LD snippets into one; hence my original proposal of keeping it to one. Would they be like several top level JSON-LD objects in an array? Are the JSON content simply concatenated as strings? What would the user expect?

I am fine accepting several scripts if we have a clear story on this.

iherman · 2018-08-24T16:27:25Z

@gkellogg

I agree that if an HTML has multiple script elements that they should all be considered and merged into a common dataset. My own RDFa processor looks for any script element with a type attribute associated with an RDF reader, along with Microdata and RDF/XML and extracts triples from all.

I guess what you do is to merge these as RDF Graphs. This is also what I do in my RDFa+microdata processor. We can of course do that for several scripts, too, but I am a bit concerned whether this is something working with our user audience...

ajs6f · 2018-08-24T16:28:40Z

@iherman You make a good point. For instance documents, we can indeed go to RDF merge, but contexts... have to think about that! 🤔

gkellogg · 2018-08-24T17:11:11Z

but I am a bit concerned whether this is something working with our user audience...

I've actually fielded Linter issues because of automatic creating of many (100's) of JSON-LD scripts in a document; I needed to encourage them to consolidate, but yes, it can happen for SEO.

BigBlueHat · 2018-09-05T13:24:50Z

I believe we must do that. Embedded JSON-LD is, currently, the only way schema.org data is used, and there are a number of application (e.g., Web Publications) where this is the sensible way to go.

@iherman for the record, JSON-LD is the recommended way, but Google (at least) supports RDFa and Microdata for Schema.org extraction: https://developers.google.com/search/docs/guides/intro-structured-data#structured-data-format Additionally, Bing only recently (this past quarter) added JSON-LD support, but prior to that processed both RDFa and Microdata (afaik). Lastly, Open Graph Protocol is popular with sites targeting "social embedding" on Facebook, LinkedIn, etc (it's even in use on this page).

Consequently, I'd love to explore a WG Note (or some such) that helps resolve some of the vagueness around mixing these things together (which happens often).

BigBlueHat · 2018-09-05T19:47:14Z

The issue about choosing among script tags was surfaced for the use case where the context references an HTML document with embedded JSON-LD script(s). In this case, which one would be used as the context, or would they all be used?

Ignoring (for now) the inherent risks of depending on embedded JSON-LD for storing (and extracting) a context expression from within HTML, we could "upgrade" the https://www.w3.org/ns/json-ld#context string from only defined as a link relationship (as currently defined) and expand it to include using it as a profile (or other) media type parameter.

<script type="application/ld+json;profile=https://www.w3.org/ns/json-ld#context">
{"@context": {}}
</script>

That could have interesting potential use in future markup-based graph expressions also--one can imagine an RDFa 2.0 which could lean on JSON-LD based contexts so that any expressed graph content maps to the same names throughout the document. But now I'm probably day dreaming. 😁

…extraction, how to deal with multiple script elements and script element targeting using fragments. Fixes #23 and fixes #57.

gkellogg added spec:enhancement test:needs tests labels Aug 23, 2018

azaroth42 mentioned this issue Aug 24, 2018

Normativeness of the embedded form of JSON-LD #22

Closed

BigBlueHat mentioned this issue Sep 5, 2018

What is 'base' for an embedded json-ld? #23

Closed

gkellogg added a commit that referenced this issue Sep 23, 2018

Update the JSON-LD in HTML section to be normative, describe dataset …

df4157e

…extraction, how to deal with multiple script elements and script element targeting using fragments. Fixes #23 and fixes #57.

gkellogg mentioned this issue Sep 23, 2018

JSON-LD in HTML #68

Merged

gkellogg added a commit that referenced this issue Sep 24, 2018

Update the JSON-LD in HTML section to be normative, describe dataset …

1ac0921

…extraction, how to deal with multiple script elements and script element targeting using fragments. Fixes #23 and fixes #57.

gkellogg added a commit that referenced this issue Sep 24, 2018

Update the JSON-LD in HTML section to be normative, describe dataset …

36807ca

…extraction, how to deal with multiple script elements and script element targeting using fragments. Fixes #23 and fixes #57.

gkellogg added a commit that referenced this issue Sep 25, 2018

Update the JSON-LD in HTML section to be normative, describe dataset …

743d40d

…extraction, how to deal with multiple script elements and script element targeting using fragments. Fixes #23 and fixes #57.

gkellogg added a commit that referenced this issue Sep 25, 2018

Update the JSON-LD in HTML section to be normative, describe dataset …

087b6ef

…extraction, how to deal with multiple script elements and script element targeting using fragments. Fixes #23 and fixes #57.

gkellogg added a commit that referenced this issue Sep 26, 2018

Update the JSON-LD in HTML section to be normative, describe dataset …

64fe47a

…extraction, how to deal with multiple script elements and script element targeting using fragments. Fixes #23 and fixes #57.

gkellogg added a commit that referenced this issue Oct 3, 2018

Update the JSON-LD in HTML section to be normative, describe dataset …

10b56e9

…extraction, how to deal with multiple script elements and script element targeting using fragments. Fixes #23 and fixes #57.

gkellogg added a commit that referenced this issue Oct 17, 2018

Update the JSON-LD in HTML section to be normative, describe dataset …

7df8e61

…extraction, how to deal with multiple script elements and script element targeting using fragments. Fixes #23 and fixes #57.

gkellogg added a commit that referenced this issue Nov 5, 2018

Update the JSON-LD in HTML section to be normative, describe dataset …

5160552

…extraction, how to deal with multiple script elements and script element targeting using fragments. Fixes #23 and fixes #57.

gkellogg mentioned this issue Nov 12, 2018

Json ld in html w3c/json-ld-api#50

Merged

gkellogg closed this as completed in #68 Nov 16, 2018

gkellogg added a commit that referenced this issue Nov 16, 2018

Update the JSON-LD in HTML section to be normative, describe dataset …

4920d83

…extraction, how to deal with multiple script elements and script element targeting using fragments. Fixes #23 and fixes #57.

azaroth42 added the satisfied Requirement Satisfied label Nov 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make processing of embedded HTML normative #57

Make processing of embedded HTML normative #57

gkellogg commented Aug 23, 2018

iherman commented Aug 24, 2018

ajs6f commented Aug 24, 2018

gkellogg commented Aug 24, 2018

ajs6f commented Aug 24, 2018

iherman commented Aug 24, 2018

iherman commented Aug 24, 2018

ajs6f commented Aug 24, 2018

gkellogg commented Aug 24, 2018

BigBlueHat commented Sep 5, 2018

BigBlueHat commented Sep 5, 2018

Make processing of embedded HTML normative #57

Make processing of embedded HTML normative #57

Comments

gkellogg commented Aug 23, 2018

iherman commented Aug 24, 2018

ajs6f commented Aug 24, 2018

gkellogg commented Aug 24, 2018

ajs6f commented Aug 24, 2018

iherman commented Aug 24, 2018

iherman commented Aug 24, 2018

ajs6f commented Aug 24, 2018

gkellogg commented Aug 24, 2018

BigBlueHat commented Sep 5, 2018

BigBlueHat commented Sep 5, 2018