-
Notifications
You must be signed in to change notification settings - Fork 34
Json ld in html #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Json ld in html #50
Conversation
…ripts` option and adds `contentType` accessor to `RemoteDocument`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this require conformant JSON-LD 1.1 processors to handle the embedded-in-HTML scenario?
It seems feasible that three could be (now) three "grades"/"levels" of JSON-LD processors:
- a minimum one which only handles JSON objects (so all context references are "cached" or internally aliases)--i.e. no "requests" plumbing and no HTML parsing
- one that comes with protocol "kit" to dereference comment URL's for contexts (http, https, etc)--i.e. includes "requests" plumbing, but no HTML parsing
- lastly, one that does HTML parsing--though...here again, the "requests" plumbing could actually be seen as optional (if context files are cached/aliased to local objects)
Hrm...so maybe that's 4?
- JSON-LD
- JSON-LD + URLs
- JSON-LD + HTML
- JSON-LD + HTML + URLs
"@context": { | ||
"foo": {"@id": "http://example.com/foo", "@container": "@list"} | ||
}, | ||
"foo": [{"@value": "<!-- -->"}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use I can imagine here is a "risky" value expressing a CMS "widget'/Web Component-like thing. Such as:
<!-- cool CMS widget -->
<script>console.log('awesome!');</script>
<div class="cms-widget">
<h1>Awesome Widget!</h1>
</div>
This is one of the reasons I hope the HTML comment intermingling/escaping is unnecessary. If it is necessary, we'll end up with loads of various "escapings" (or not) of the same JSON-LD...which will need normalizing before parsing. 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly, HTML embedded in HTML contained in an HTML script would be one such use. Also, various HTML validators complain when script elements contain things that it doesn't expect, and placing comments around the entire JSON-LD block is one solution for this.
Intermingling without escapes is not supported, to be intermingled, they must be escaped. This could be using <\!-- --\>
or <!-- -->
. Easiest thing is to comment out the whole block, and deal specifically with any embedded XML comments rather than entity encode the contents; either would work, though.
<html> | ||
<head> | ||
<script type="application/ld+json"> | ||
<!-- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other scenario here being: <!-- this is for SEO -->
or some such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be invalid per our text, as after unescaping, "this is for SEO" would not be valid JSON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Nothing we can change...just didn't know if we should have a test that looked like that-kind-of-broken. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expand/h017-in.html looks for invalid JSON-LD, but not inside a comment. Probably easiest to modify this to the following:
<html>
<head>
<script type="application/ld+json">
<!-- foo -->
</script>
</head>
</html>
@@ -5225,34 +5345,38 @@ <h3>RemoteDocument</h3> | |||
USVString contextUrl = null; | |||
USVString documentUrl; | |||
any document; | |||
USVString contentType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should plan for more headers here? Especially given our discussions with the DXWG in w3c/dxwg#261 and face-to-face at TPAC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there necessary for JSON-LD processor conformance, then we should include more headers. It's not clear to me that Accept-Profile
or Accept-Schema
would be used inside a JSON-LD processor.
I did consider breaking out other media-type parameters in a separate field, so that code can just look for, e.g. contentType === "application/ld+json"
rather than something like contentType.startsWith("application/ld+json")
. We want to go there when we consider profiles more fully.
It seems to me that we discussed creating optional subsets of the spec, and other than for Framing, rejected the idea as being poor for interoperability. Of course, we can discuss it again. As I recall, the barrier for implementations to include HTML parsing seemed low enough that all implementations should be expected to do this. (@azaroth42 may remember more, I think he's the one that made that point). If we do want to go there, the tests should be removed from the expand/compact/flatten/toRdf manifests and a new html-manifest should be created, so that it doesn't look like scattered failures.
I recall that in discussing URL options for documentation, the ability to use JSON-LD embedded in HTML seemed like the best way to do such documentation. |
@gkellogg as we consider other deployments like Web of Things (i.e. JSON-LD in a light bulb), we might want to consider revisiting these growing processing requirements being made on all processors. Related to this, we should weigh whether Seems modulerizing it a bit (see my list above) could actually widen the use of JSON-LD and narrow the scope of "interop." Maybe. 😃 |
Sounds like a worthy topic of a teleconference, but I think we did already consider it. |
This issue was discussed in a meeting.
View the transcriptWhat is ‘base’ for embedded json-ld?
Benjamin Young: we discussed that one at tpac Gregg Kellogg: there are 2 open PRs Adam Soroka: quick question, what are we expected to do with their comments? Ivan Herman: what they propose is interesting but beyond our charter
Ivan Herman: regarding the PR-93, there is some stuff about having XML Benjamin Young: the thing I just linked shows how script tags affect html parsing Gregg Kellogg: what I did in the PR-68 I call out specifics on how to handle those blocks if the media type is application/json Benjamin Young: the HTML comments stuff as really bothered me since I’ve read it Ivan Herman: for the comment storing, the whole section is a normative thing
Ivan Herman: we should officially answer to the TAG and will officially add to the standard what they said about base
Gregg Kellogg: comments in html and escaping.. it depends on the encoding
Ivan Herman: it has to be valid json-ld Gregg Kellogg: that’s something you see quite often
Gregg Kellogg: comments are often used just to make sure there are no other issues embedded in the script elements that would cause any issues Benjamin Young: I did quite some digging on that issue Pierre-Antoine Champin: one crazy idea by looking at the json-ld embedded in html comments: you could add a js comment in front of the html comment, making it valid javascript Benjamin Young: sadly it wouldn’t Gregg Kellogg: the json-ld would not be allowed to contain anything that could be interpreted as html and/or html comments Harold Solbrig: why is this an json-ld issue but not a javascript issue? Gregg Kellogg: [explains why it isn’t] Gregg Kellogg: it did some test cases for this, exploring corner cases we know of Gregg Kellogg: it describes script tags and data blocks are a subset Benjamin Young: what’s breaking it, is the potential of one to too early close the script tag Ivan Herman: is it so horrible to say, if I put json-ld in a script tag I’m supposed to escape anything that html would need to have escaped Gregg Kellogg: for someone who’s actually looking at the source, those entities become rather annoying Ivan Herman: realistically, I don’t know how often this would happen Benjamin Young: the escaping issue is very similar of putting json-ld inside a text env. Ivan Herman: I think it’s perfectly reasonable to accept both PRs, close the issue Gregg Kellogg: it’s a editor’s draft not a working draft Ivan Herman: we would open a issue right away Benjamin Young: I would only +1 this, if we add a big red AT RISK disclaimer Ivan Herman: a lot of very important things are pending for now Adam Soroka: I don’t think we should use a phrase like “AT RISK” but more something along the lines of “will be part of the final spec but might undergo some changes” Ivan Herman: we cannot commit ourselves to having always consistent editor’s drafts Benjamin Young: I’m not sure we have reached consensus on all the things contained Gregg Kellogg: I cannot work on other open issues Pierre-Antoine Champin: what about a parameter on the media type hinting at having to do unescaping? (like
Ivan Herman: what does “that” mean?
Benjamin Young: I don’t want to have stuff merged without reaching consensus Ivan Herman: putting things that are already done “at risk” would be going backwards Adam Soroka: I have to generally agree with ivan
Adam Soroka: it seems for me very unlikely that we would stop talking about it
Benjamin Young: I’m fine with merging those
|
extractAllScripts
option and addscontentType
accessor toRemoteDocument
.For w3c/json-ld-syntax#57.
Preview | Diff