Skip to content

Json ld in html #50

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 16, 2018
Merged

Json ld in html #50

merged 7 commits into from
Nov 16, 2018

Conversation

gkellogg
Copy link
Member

@gkellogg gkellogg commented Nov 12, 2018

  • Spec text for running API methods with HTML input. Adds extractAllScripts option and adds contentType accessor to RemoteDocument.
  • HTML tests for expand, compact, flatten and toRdf.

For w3c/json-ld-syntax#57.


Preview | Diff

Copy link
Member

@BigBlueHat BigBlueHat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this require conformant JSON-LD 1.1 processors to handle the embedded-in-HTML scenario?

It seems feasible that three could be (now) three "grades"/"levels" of JSON-LD processors:

  • a minimum one which only handles JSON objects (so all context references are "cached" or internally aliases)--i.e. no "requests" plumbing and no HTML parsing
  • one that comes with protocol "kit" to dereference comment URL's for contexts (http, https, etc)--i.e. includes "requests" plumbing, but no HTML parsing
  • lastly, one that does HTML parsing--though...here again, the "requests" plumbing could actually be seen as optional (if context files are cached/aliased to local objects)

Hrm...so maybe that's 4?

  • JSON-LD
  • JSON-LD + URLs
  • JSON-LD + HTML
  • JSON-LD + HTML + URLs

"@context": {
"foo": {"@id": "http://example.com/foo", "@container": "@list"}
},
"foo": [{"@value": "<!-- -->"}]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use I can imagine here is a "risky" value expressing a CMS "widget'/Web Component-like thing. Such as:

<!-- cool CMS widget -->
<script>console.log('awesome!');</script>
<div class="cms-widget">
  <h1>Awesome Widget!</h1>
</div>

This is one of the reasons I hope the HTML comment intermingling/escaping is unnecessary. If it is necessary, we'll end up with loads of various "escapings" (or not) of the same JSON-LD...which will need normalizing before parsing. 😢

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly, HTML embedded in HTML contained in an HTML script would be one such use. Also, various HTML validators complain when script elements contain things that it doesn't expect, and placing comments around the entire JSON-LD block is one solution for this.

Intermingling without escapes is not supported, to be intermingled, they must be escaped. This could be using <\!-- --\> or &lt;!-- --&gt;. Easiest thing is to comment out the whole block, and deal specifically with any embedded XML comments rather than entity encode the contents; either would work, though.

<html>
<head>
<script type="application/ld+json">
<!--
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other scenario here being: <!-- this is for SEO --> or some such.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be invalid per our text, as after unescaping, "this is for SEO" would not be valid JSON.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Nothing we can change...just didn't know if we should have a test that looked like that-kind-of-broken. 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expand/h017-in.html looks for invalid JSON-LD, but not inside a comment. Probably easiest to modify this to the following:

<html>
  <head>
    <script type="application/ld+json">
    <!-- foo -->
    </script>
  </head>
</html>

@@ -5225,34 +5345,38 @@ <h3>RemoteDocument</h3>
USVString contextUrl = null;
USVString documentUrl;
any document;
USVString contentType;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should plan for more headers here? Especially given our discussions with the DXWG in w3c/dxwg#261 and face-to-face at TPAC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there necessary for JSON-LD processor conformance, then we should include more headers. It's not clear to me that Accept-Profile or Accept-Schema would be used inside a JSON-LD processor.

I did consider breaking out other media-type parameters in a separate field, so that code can just look for, e.g. contentType === "application/ld+json" rather than something like contentType.startsWith("application/ld+json"). We want to go there when we consider profiles more fully.

@gkellogg
Copy link
Member Author

Does this require conformant JSON-LD 1.1 processors to handle the embedded-in-HTML scenario?

It seems to me that we discussed creating optional subsets of the spec, and other than for Framing, rejected the idea as being poor for interoperability. Of course, we can discuss it again. As I recall, the barrier for implementations to include HTML parsing seemed low enough that all implementations should be expected to do this. (@azaroth42 may remember more, I think he's the one that made that point).

If we do want to go there, the tests should be removed from the expand/compact/flatten/toRdf manifests and a new html-manifest should be created, so that it doesn't look like scattered failures.

one that comes with protocol "kit" to dereference comment URL's for contexts (http, https, etc)--i.e. includes "requests" plumbing, but no HTML parsing

I recall that in discussing URL options for documentation, the ability to use JSON-LD embedded in HTML seemed like the best way to do such documentation.

@BigBlueHat
Copy link
Member

@gkellogg as we consider other deployments like Web of Things (i.e. JSON-LD in a light bulb), we might want to consider revisiting these growing processing requirements being made on all processors.

Related to this, we should weigh whether Content-Type: text/html responses are to be treated immediately as a valid JSON-LD "encoding" or if this is a special scenario from which JSON-LD may be extracted (using defined steps per this PR) and then fed to a "JSON-LD processor."

Seems modulerizing it a bit (see my list above) could actually widen the use of JSON-LD and narrow the scope of "interop." Maybe. 😃

@gkellogg
Copy link
Member Author

might want to consider revisiting these growing processing requirements being made on all processors.

Sounds like a worthy topic of a teleconference, but I think we did already consider it.

@gkellogg
Copy link
Member Author

This issue was discussed in a meeting.

  • Resolution #2: merge open HTML related PRs #93 and #68 and #50 after adding “At Risk” (or similar terminology) to present that things are not finalized
View the transcript What is ‘base’ for embedded json-ld?

Ivan Herman: link: w3c/json-ld-syntax#23

Ivan Herman: Link of the PR: w3c/json-ld-syntax#93

Benjamin Young: w3c/json-ld-syntax#68

Gregg Kellogg: #50

Benjamin Young: we discussed that one at tpac
… we sent it in for TAG review, and they basically widened the scope

Gregg Kellogg: there are 2 open PRs
… 1) basic support for json-ld in html
… 2) PR-93 adds text to specifically add text to add html as base
… in the API spec, it’s PR-50

Adam Soroka: quick question, what are we expected to do with their comments?
… shall we respond?

Ivan Herman: what they propose is interesting but beyond our charter
… this would elevate json-ld
… but yeah beyond our charter
… I would say this is something the CG has to pick up
… and we can cross the bridge at some time, but if this is realistic from a manpower perspective I dont know

Benjamin Young: w3c/json-ld-syntax#68 (comment)

Ivan Herman: regarding the PR-93, there is some stuff about having XML

Benjamin Young: the thing I just linked shows how script tags affect html parsing
… it’s syntactically correct json-ld

Gregg Kellogg: what I did in the PR-68 I call out specifics on how to handle those blocks if the media type is application/json
… I think I’ve taken in the specifics on how content of script tags has to be handled and adjusted in for our needs
… we asked specific questions to TAG, got an answer but they kinda got a bit over enthusiastic
… out of this needs to come something that improves web platform

Benjamin Young: the HTML comments stuff as really bothered me since I’ve read it
… but it seems to primarily affect only HTML parsing
… question is how much of this we need to have in the spec
… json-ld in script tags vs “raw” json-ld
… both have totally different escaping rules and what not.. and none of that has something to do with html base

Ivan Herman: for the comment storing, the whole section is a normative thing
… I have the impression this is an HTML problem
… which we should certainly mention, but maybe not as part of a normative section

Benjamin Young: HTML5 spec level text about parsing <script> tags https://www.w3.org/TR/html5/semantics-scripting.html#restrictions-for-contents-of-script-elements

Ivan Herman: we should officially answer to the TAG and will officially add to the standard what they said about base
… and that we try to get the CG involved

Adam Soroka: +1

Gregg Kellogg: comments in html and escaping.. it depends on the encoding
… we don’t need to give guidance on how to handle this

Ivan Herman: https://pr-preview.s3.amazonaws.com/w3c/json-ld-syntax/pull/68.html#ex-103-embedding-json-ld-in-html-with-comments

Ivan Herman: it has to be valid json-ld
… that’s invalid json

Gregg Kellogg: that’s something you see quite often

Benjamin Young: ivan: because https://www.w3.org/TR/html5/semantics-scripting.html#restrictions-for-contents-of-script-elements

Gregg Kellogg: comments are often used just to make sure there are no other issues embedded in the script elements that would cause any issues

Benjamin Young: I did quite some digging on that issue
… the DOM parsing is only concerned with what’s inside the tags
… it’s just treated as raw string
… if there’s something inside the json-ld an html parser would choke on
… the json-ld would need to be treated in such a way such that an html parser wouldn’t choke on it

Pierre-Antoine Champin: one crazy idea by looking at the json-ld embedded in html comments: you could add a js comment in front of the html comment, making it valid javascript
… it could become technically correct

Benjamin Young: sadly it wouldn’t
… it would continue being parsed
… we could make it our own parsing space
… the question is how far the parser gets before it finds the ending script tag

Gregg Kellogg: the json-ld would not be allowed to contain anything that could be interpreted as html and/or html comments
… not really feasible and also not helping our mission
… I’ve outlined multiple approaches to tackle this
… there are only a few cases where json-ld would contain things that resemble comments

Harold Solbrig: why is this an json-ld issue but not a javascript issue?

Gregg Kellogg: [explains why it isn’t]

Gregg Kellogg: it did some test cases for this, exploring corner cases we know of
… I don’t know how to move forward unless addressing at least some of the stuff from the PR

Gregg Kellogg: it describes script tags and data blocks are a subset

Benjamin Young: what’s breaking it, is the potential of one to too early close the script tag
… so this would need somehow being taken care of
… the risk is, the json-ld could contain content that jacks up the html it’s contained in

Ivan Herman: is it so horrible to say, if I put json-ld in a script tag I’m supposed to escape anything that html would need to have escaped
… thus a json-ld parser would have to do the unescaping
… but you are in HTML regardless.. so

Gregg Kellogg: for someone who’s actually looking at the source, those entities become rather annoying

Ivan Herman: realistically, I don’t know how often this would happen

Benjamin Young: the escaping issue is very similar of putting json-ld inside a text env.

Ivan Herman: I think it’s perfectly reasonable to accept both PRs, close the issue
… and open a new issue on the specific problem

Gregg Kellogg: it’s a editor’s draft not a working draft

Ivan Herman: we would open a issue right away

Benjamin Young: I would only +1 this, if we add a big red AT RISK disclaimer

Ivan Herman: a lot of very important things are pending for now
… I think it’s an edge case

Adam Soroka: I don’t think we should use a phrase like “AT RISK” but more something along the lines of “will be part of the final spec but might undergo some changes”

Ivan Herman: we cannot commit ourselves to having always consistent editor’s drafts

Benjamin Young: I’m not sure we have reached consensus on all the things contained

Gregg Kellogg: I cannot work on other open issues

Pierre-Antoine Champin: what about a parameter on the media type hinting at having to do unescaping? (like application/ld+json;escaped=html)

Proposed resolution: merge open HTML related PRs #93 and #68 and #50 after adding “At Risk” (or similar terminology) to present that things are not finalized (Benjamin Young)

Benjamin Young: +1

Adam Soroka: +1

Ivan Herman: what does “that” mean?

Gregg Kellogg: +1

Benjamin Young: I don’t want to have stuff merged without reaching consensus
… but I could live with having it marked as being “at risk” or similar

Ivan Herman: putting things that are already done “at risk” would be going backwards
… opening a new issue that highlights things that are still being discussed would be ok
… having the feeling that ~90% are done

Adam Soroka: I have to generally agree with ivan

Ivan Herman: +1

Adam Soroka: it seems for me very unlikely that we would stop talking about it

David Newbury: +1

Harold Solbrig: +1

Benjamin Young: I’m fine with merging those

Simon Steyskal: +1

Resolution #2: merge open HTML related PRs #93 and #68 and #50 after adding “At Risk” (or similar terminology) to present that things are not finalized

Pierre-Antoine Champin: +1

@gkellogg gkellogg merged commit 1dde17d into master Nov 16, 2018
@gkellogg gkellogg deleted the json-ld-in-html branch November 16, 2018 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants