Json ld in html #50

gkellogg · 2018-11-12T01:14:52Z

Spec text for running API methods with HTML input. Adds extractAllScripts option and adds contentType accessor to RemoteDocument.
HTML tests for expand, compact, flatten and toRdf.

For w3c/json-ld-syntax#57.

Preview | Diff

…ripts` option and adds `contentType` accessor to `RemoteDocument`.

…n extracting examples.

BigBlueHat

Does this require conformant JSON-LD 1.1 processors to handle the embedded-in-HTML scenario?

It seems feasible that three could be (now) three "grades"/"levels" of JSON-LD processors:

a minimum one which only handles JSON objects (so all context references are "cached" or internally aliases)--i.e. no "requests" plumbing and no HTML parsing
one that comes with protocol "kit" to dereference comment URL's for contexts (http, https, etc)--i.e. includes "requests" plumbing, but no HTML parsing
lastly, one that does HTML parsing--though...here again, the "requests" plumbing could actually be seen as optional (if context files are cached/aliased to local objects)

Hrm...so maybe that's 4?

JSON-LD
JSON-LD + URLs
JSON-LD + HTML
JSON-LD + HTML + URLs

BigBlueHat · 2018-11-12T19:59:49Z

tests/expand/h014-in.html

+      "@context": {
+        "foo": {"@id": "http://example.com/foo", "@container": "@list"}
+      },
+      "foo": [{"@value": "<!-- -->"}]


The use I can imagine here is a "risky" value expressing a CMS "widget'/Web Component-like thing. Such as:

 <script>console.log('awesome!');</script> <div class="cms-widget"> <h1>Awesome Widget!</h1> </div>

This is one of the reasons I hope the HTML comment intermingling/escaping is unnecessary. If it is necessary, we'll end up with loads of various "escapings" (or not) of the same JSON-LD...which will need normalizing before parsing. 😢

Certainly, HTML embedded in HTML contained in an HTML script would be one such use. Also, various HTML validators complain when script elements contain things that it doesn't expect, and placing comments around the entire JSON-LD block is one solution for this.

Intermingling without escapes is not supported, to be intermingled, they must be escaped. This could be using <\!-- --\> or . Easiest thing is to comment out the whole block, and deal specifically with any embedded XML comments rather than entity encode the contents; either would work, though.

BigBlueHat · 2018-11-12T20:00:24Z

tests/expand/h015-in.html

+<html>
+  <head>
+    <script type="application/ld+json">
+    <!--


The other scenario here being:  or some such.

That would be invalid per our text, as after unescaping, "this is for SEO" would not be valid JSON.

Right. Nothing we can change...just didn't know if we should have a test that looked like that-kind-of-broken. 😄

expand/h017-in.html looks for invalid JSON-LD, but not inside a comment. Probably easiest to modify this to the following:

<html> <head> <script type="application/ld+json">  </script> </head> </html>

BigBlueHat · 2018-11-12T20:05:40Z

index.html

@@ -5225,34 +5345,38 @@ <h3>RemoteDocument</h3>
          USVString contextUrl = null;
          USVString documentUrl;
          any       document;
+          USVString contentType;


Maybe we should plan for more headers here? Especially given our discussions with the DXWG in w3c/dxwg#261 and face-to-face at TPAC.

If there necessary for JSON-LD processor conformance, then we should include more headers. It's not clear to me that Accept-Profile or Accept-Schema would be used inside a JSON-LD processor.

I did consider breaking out other media-type parameters in a separate field, so that code can just look for, e.g. contentType === "application/ld+json" rather than something like contentType.startsWith("application/ld+json"). We want to go there when we consider profiles more fully.

gkellogg · 2018-11-12T23:17:29Z

Does this require conformant JSON-LD 1.1 processors to handle the embedded-in-HTML scenario?

It seems to me that we discussed creating optional subsets of the spec, and other than for Framing, rejected the idea as being poor for interoperability. Of course, we can discuss it again. As I recall, the barrier for implementations to include HTML parsing seemed low enough that all implementations should be expected to do this. (@azaroth42 may remember more, I think he's the one that made that point).

If we do want to go there, the tests should be removed from the expand/compact/flatten/toRdf manifests and a new html-manifest should be created, so that it doesn't look like scattered failures.

one that comes with protocol "kit" to dereference comment URL's for contexts (http, https, etc)--i.e. includes "requests" plumbing, but no HTML parsing

I recall that in discussing URL options for documentation, the ability to use JSON-LD embedded in HTML seemed like the best way to do such documentation.

BigBlueHat · 2018-11-13T21:36:02Z

@gkellogg as we consider other deployments like Web of Things (i.e. JSON-LD in a light bulb), we might want to consider revisiting these growing processing requirements being made on all processors.

Related to this, we should weigh whether Content-Type: text/html responses are to be treated immediately as a valid JSON-LD "encoding" or if this is a special scenario from which JSON-LD may be extracted (using defined steps per this PR) and then fed to a "JSON-LD processor."

Seems modulerizing it a bit (see my list above) could actually widen the use of JSON-LD and narrow the scope of "interop." Maybe. 😃

gkellogg · 2018-11-13T22:07:32Z

might want to consider revisiting these growing processing requirements being made on all processors.

Sounds like a worthy topic of a teleconference, but I think we did already consider it.

gkellogg · 2018-11-16T18:54:10Z

This issue was discussed in a meeting.

Resolution #2: merge open HTML related PRs #93 and #68 and #50 after adding “At Risk” (or similar terminology) to present that things are not finalized

View the transcript

What is ‘base’ for embedded json-ld?

Ivan Herman: link: w3c/json-ld-syntax#23

Ivan Herman: Link of the PR: w3c/json-ld-syntax#93

Benjamin Young: w3c/json-ld-syntax#68

Gregg Kellogg: #50

Benjamin Young: we discussed that one at tpac
… we sent it in for TAG review, and they basically widened the scope

Gregg Kellogg: there are 2 open PRs
… 1) basic support for json-ld in html
… 2) PR-93 adds text to specifically add text to add html as base
… in the API spec, it’s PR-50

Adam Soroka: quick question, what are we expected to do with their comments?
… shall we respond?

Ivan Herman: what they propose is interesting but beyond our charter
… this would elevate json-ld
… but yeah beyond our charter
… I would say this is something the CG has to pick up
… and we can cross the bridge at some time, but if this is realistic from a manpower perspective I dont know

Benjamin Young: w3c/json-ld-syntax#68 (comment)

Ivan Herman: regarding the PR-93, there is some stuff about having XML

Benjamin Young: the thing I just linked shows how script tags affect html parsing
… it’s syntactically correct json-ld

Gregg Kellogg: what I did in the PR-68 I call out specifics on how to handle those blocks if the media type is application/json
… I think I’ve taken in the specifics on how content of script tags has to be handled and adjusted in for our needs
… we asked specific questions to TAG, got an answer but they kinda got a bit over enthusiastic
… out of this needs to come something that improves web platform

Benjamin Young: the HTML comments stuff as really bothered me since I’ve read it
… but it seems to primarily affect only HTML parsing
… question is how much of this we need to have in the spec
… json-ld in script tags vs “raw” json-ld
… both have totally different escaping rules and what not.. and none of that has something to do with html base

Ivan Herman: for the comment storing, the whole section is a normative thing
… I have the impression this is an HTML problem
… which we should certainly mention, but maybe not as part of a normative section

Benjamin Young: HTML5 spec level text about parsing <script> tags https://www.w3.org/TR/html5/semantics-scripting.html#restrictions-for-contents-of-script-elements

Ivan Herman: we should officially answer to the TAG and will officially add to the standard what they said about base
… and that we try to get the CG involved

Adam Soroka: +1

Gregg Kellogg: comments in html and escaping.. it depends on the encoding
… we don’t need to give guidance on how to handle this

Ivan Herman: https://pr-preview.s3.amazonaws.com/w3c/json-ld-syntax/pull/68.html#ex-103-embedding-json-ld-in-html-with-comments

Ivan Herman: it has to be valid json-ld
… that’s invalid json

Gregg Kellogg: that’s something you see quite often

Benjamin Young: ivan: because https://www.w3.org/TR/html5/semantics-scripting.html#restrictions-for-contents-of-script-elements

Gregg Kellogg: comments are often used just to make sure there are no other issues embedded in the script elements that would cause any issues

Benjamin Young: I did quite some digging on that issue
… the DOM parsing is only concerned with what’s inside the tags
… it’s just treated as raw string
… if there’s something inside the json-ld an html parser would choke on
… the json-ld would need to be treated in such a way such that an html parser wouldn’t choke on it

Pierre-Antoine Champin: one crazy idea by looking at the json-ld embedded in html comments: you could add a js comment in front of the html comment, making it valid javascript
… it could become technically correct

Benjamin Young: sadly it wouldn’t
… it would continue being parsed
… we could make it our own parsing space
… the question is how far the parser gets before it finds the ending script tag

Gregg Kellogg: the json-ld would not be allowed to contain anything that could be interpreted as html and/or html comments
… not really feasible and also not helping our mission
… I’ve outlined multiple approaches to tackle this
… there are only a few cases where json-ld would contain things that resemble comments

Harold Solbrig: why is this an json-ld issue but not a javascript issue?

Gregg Kellogg: [explains why it isn’t]

Gregg Kellogg: it did some test cases for this, exploring corner cases we know of
… I don’t know how to move forward unless addressing at least some of the stuff from the PR

Gregg Kellogg: it describes script tags and data blocks are a subset

Benjamin Young: what’s breaking it, is the potential of one to too early close the script tag
… so this would need somehow being taken care of
… the risk is, the json-ld could contain content that jacks up the html it’s contained in

Ivan Herman: is it so horrible to say, if I put json-ld in a script tag I’m supposed to escape anything that html would need to have escaped
… thus a json-ld parser would have to do the unescaping
… but you are in HTML regardless.. so

Gregg Kellogg: for someone who’s actually looking at the source, those entities become rather annoying

Ivan Herman: realistically, I don’t know how often this would happen

Benjamin Young: the escaping issue is very similar of putting json-ld inside a text env.

Ivan Herman: I think it’s perfectly reasonable to accept both PRs, close the issue
… and open a new issue on the specific problem

Gregg Kellogg: it’s a editor’s draft not a working draft

Ivan Herman: we would open a issue right away

Benjamin Young: I would only +1 this, if we add a big red AT RISK disclaimer

Ivan Herman: a lot of very important things are pending for now
… I think it’s an edge case

Adam Soroka: I don’t think we should use a phrase like “AT RISK” but more something along the lines of “will be part of the final spec but might undergo some changes”

Ivan Herman: we cannot commit ourselves to having always consistent editor’s drafts

Benjamin Young: I’m not sure we have reached consensus on all the things contained

Gregg Kellogg: I cannot work on other open issues

Pierre-Antoine Champin: what about a parameter on the media type hinting at having to do unescaping? (like application/ld+json;escaped=html)

Proposed resolution: merge open HTML related PRs #93 and #68 and #50 after adding “At Risk” (or similar terminology) to present that things are not finalized (Benjamin Young)

Benjamin Young: +1

Adam Soroka: +1

Ivan Herman: what does “that” mean?

Gregg Kellogg: +1

Benjamin Young: I don’t want to have stuff merged without reaching consensus
… but I could live with having it marked as being “at risk” or similar

Ivan Herman: putting things that are already done “at risk” would be going backwards
… opening a new issue that highlights things that are still being discussed would be ok
… having the feeling that ~90% are done

Adam Soroka: I have to generally agree with ivan

Ivan Herman: +1

Adam Soroka: it seems for me very unlikely that we would stop talking about it

David Newbury: +1

Harold Solbrig: +1

Benjamin Young: I’m fine with merging those

Simon Steyskal: +1

Resolution #2: merge open HTML related PRs #93 and #68 and #50 after adding “At Risk” (or similar terminology) to present that things are not finalized

Pierre-Antoine Champin: +1

gkellogg added 4 commits November 7, 2018 12:10

Update extract-examples from common.

4ce9628

Semantic line breaks in WebIDL section.

d5bf896

Spec text for running API methods with HTML input. Adds `extractAllSc…

ddee066

…ripts` option and adds `contentType` accessor to `RemoteDocument`.

HTML tests for expand, compact, flatten and toRdf.

46302eb

gkellogg requested review from davidlehn and BigBlueHat November 12, 2018 01:15

gkellogg added 2 commits November 11, 2018 17:28

Update replacement for comments that should remain in the output.

7c42ee7

More work at undoing comments that should be viewable in the HTML whe…

b2d384d

…n extracting examples.

BigBlueHat reviewed Nov 12, 2018

View reviewed changes

Test invalid JSON-LD inside a comment (updates h017 expansion test).

6a6171a

iherman mentioned this pull request Nov 16, 2018

What is 'base' for an embedded json-ld? w3c/json-ld-syntax#23

Closed

gkellogg merged commit 1dde17d into master Nov 16, 2018

gkellogg deleted the json-ld-in-html branch November 16, 2018 18:54

This was referenced Nov 16, 2018

JSON-LD in HTML w3c/json-ld-syntax#68

Merged

Html base w3c/json-ld-syntax#93

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Json ld in html #50

Json ld in html #50

gkellogg commented Nov 12, 2018 •

edited by pr-preview bot

Loading

BigBlueHat left a comment

BigBlueHat Nov 12, 2018

gkellogg Nov 12, 2018

BigBlueHat Nov 12, 2018

gkellogg Nov 12, 2018

BigBlueHat Nov 13, 2018

gkellogg Nov 13, 2018

BigBlueHat Nov 12, 2018

gkellogg Nov 12, 2018

gkellogg commented Nov 12, 2018

BigBlueHat commented Nov 13, 2018

gkellogg commented Nov 13, 2018

gkellogg commented Nov 16, 2018

Json ld in html #50

Json ld in html #50

Conversation

gkellogg commented Nov 12, 2018 • edited by pr-preview bot Loading

BigBlueHat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gkellogg commented Nov 12, 2018

BigBlueHat commented Nov 13, 2018

gkellogg commented Nov 13, 2018

gkellogg commented Nov 16, 2018

gkellogg commented Nov 12, 2018 •

edited by pr-preview bot

Loading