-
Notifications
You must be signed in to change notification settings - Fork 23
JSON-LD Context processing in HTML Documents #172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I agree that the original naming may give the wrong impression (that other processors are somehow incomplete), and discourage some people from adopting JSON-LD. "HTML Processor" is a little misleading, but a better name could indeed be found.
The argument was raised that JSON-LD Contexts are bona fide JSON-LD documents, and so it would be difficult to argue that a That being said, we could address your concern by replacing the Note, at the beginning of section 7, by a Warning, stating "not available in a Pure JSON-LD Processor" rather than "available in a Full Processor". And possibly hinting that content-negotiation is a more "portable" solution?... |
I don't think we should merely "possibly hint" at this; my preference would be to make it a requirement that you MUST make your |
I feel stronger about this than @dlongley does... don't open up the Pandora's box of reading JSON-LD Context's from HTML. Remove the feature. The only argument that I can see for it is that it's a "neat feature" in the academic completeness sense... but JSON-LD was never meant to be an academically complete mechanism... it was supposed to help developers publish JSON-LD, but not become so complex that it blows your foot off when you try to use it. Having this feature means that developers will inevitably publish their JSON-LD Context as HTML only, which will cause a split in the ecosystem between "We expect you to publish via HTML" and "We expect you to publish no via HTML". |
Isn't the only feature that the "full" processor has over the JSON-only one the fact that it parses stuff from HTML? |
Yes, but "HTML Processor" makes it sound like it can only process HTML... |
This was not added because it's a "neat feature", but as a response to concerns raised in #43. If JSON had a built-in commenting feature, it would be likely not necessary. Because of this, and the need to normatively describe the in-the-wild JSON-LD in HTML scenarios provided a mechanism to do this. Once you describe JSON-LD in HTML, then allowing that for contexts and frames is a logical progression, particularly when the extraction is described in the document loader, which is the standard way to fetch all remote content. The fact that it came up in w3c/vc-data-model#585 just goes to show a general need to be able to document contexts, and containing the context in the documenting HTML is likely a better way to keep them from diverging than using different resource formats. I agree with @dlongley that we should better describe the potential for splitting the eco-system by recommending (SHOULD) that publishers provide an |
With chair hat on...
Could you point out where in the charter it says that we can only introduce features described in input documents to the WG? Because that would also preclude features like And with chair hat off ... I agree with @gkellogg that if we say that a context is JSON-LD, and that JSON-LD can be expressed in a script element of an HTML page, then the implication is that a context can be expressed in a script element of an HTML page. If I recall correctly, @danbri has brought up his issue as a frustration of web developers. The possible routes forward seem to be:
I agree with @pchampin that "extended" is better than "full", along with a big warning about contexts in HTML being complicated in the spec. |
JSON-LD in HTML exists and even informatively--when viewed from the HTML-perspective: https://html.spec.whatwg.org/#the-script-element:attr-script-type-4 In the current spec-space, it's already possible to extract JSON-LD from HTML and use it as JSON(-LD)--because that's how data blocks work with any embedded format (CSV, YAML, etc.). We have gone beyond simply echoing that fact in the syntax document and instead baked additional processing steps into the API. Shifting things into the documentLoader space does help from an architectural layering concern, but this "context in HTML" usage raises a whole host of architectural and community concerns. It effectively moves us from the current world of extracting-then-using the embedded JSON-LD into one where HTML becomes a valid representation of JSON-LD itself. We need to work to re-narrow our focus at this stage, and go back to the "simplest thing that could possibly work." |
This issue was discussed in a meeting.
View the transcriptContexts in HTMLIvan Herman: #172 Rob Sanderson: Summary: in the spec we say that (normatively) json-ld can be included in script el. There is now a requirement on <base>. It was noted that contexts are also jsonld. Hence, it is permissible to have contexts embedded in script tags inside html. This means that processors need to be able to process that. … We all agree that this is an extension to normal proc mode. Either we need to say that contexts have a special role, contexts are not jsonld, or we need to accept that contexts can be embedded in html and processors should have to be able to say that they support processing them. Manu Sporny: Some context wrt VC. Purely json-based processors find information using context. Someone said it would be nice to have human-readable context. Argument in favor of this feature. … Person said, It would be nice to see jsonld in html. But I don’t want the burden of jsonld processor supporting html. … We all agree that JSON-LD in HTML is a huge use case (e.g. schema.org) … I think pulling in contexts from html is controversial … 2 questions … 1: does jsonld context in html greatly increase jsonld usage? … I think the answer is no … There are other ways to solve issues people would have to want html for contexts. … 2: is this going to create interop issues? … Is this going to cause ecosystem to change by other processors to start failing? … I think this is going to create issues. … Some people are going to start publishing contexts as html only. … Even if we say you should not do this. … The damage this feature could create is far greater than possible benefits. … I have more reasons, but this is the biggest argument. … We should wait until there is more demand for this feature. We could do it in the future if really needed. Benjamin Young: “This means that processors need to be able to process that.” Benjamin Young: +1 to everything Manu said. … This is the part of what Rob said in start that jsonld in html normative somehow begets this notion that we have to … … jsonld in html has always been normative thanks to the data block in script tag … we just described it better … comes from HTML5 spec. … Using single URL to specify context and its documentation is interesting. (Conneg can be used) … Overhead of making this possible is too big for processors. … This is a nuclear weapon to kill a small bird. … There are less damaging ways to solve the problems discussed. Dave Longley: +1 to manu and bigbluehat Manu Sporny: +1 to bigbluehat ! Benjamin Young: We need to be more careful than we have before, before introducing new things like these. Rob Sanderson: ref - https://www.w3.org/TR/2018/WD-json-ld11-20181214/#embedding-json-ld-in-html-documents Rob Sanderson: in 1.1, we made it our problem, so we have to solve it. … I want to channel danbri. Search engines want to include info in their knowledge graph that they find on the web as jsonld. … schema.org, or at least the engines, currently assume do not process contexts at all. … If you have a template in your website, with multiple schema.org definitions, you could put into your CMS as a data script block to push this into every single page. … search engines would be able to see these blocks … by having google’s clusters waiting to process jsonld in page. Publishers would be required to not embed into page. … why not have it as include contexts object?: when multiple people responsible for editing context. Also, if there are templare-driven CMSs being used, you want to stripe jsonld over different templates being used. This would cause data blocks being used multiple times. Dave Longley: Many of these things can be solved by saying that serving should happen with application/ld+json … I think there are many cases not being considered wrt complexity … many use cases not covered on template-based html pages … Such as dynamic pages when generated client-side with javascript … We shouldn’t get into that space. … We should say that context MUST be server with proper content type Manu Sporny: I could not follow schema.org use case. Danbri should write this up. We should do a deep analysis on this use case, to see what could address his concern. … There are a bunch of assumptions in that use case … e.g. people could create their own non-schema.org contexts. This would add a huge amount of complexity. … it would be good to have dan involved. Rob Sanderson: +1 to dlongley and manu Manu Sporny: Also, it feels like this is migrating away from BPs. … We are learning a lot from security around publishing contexts. … We had discussions on the type of attacks, if you could publish contexts as html. … So there are security concerns around this feature … Concern around complexity, interoperability, … … A long list of reasons for saying that this is not spec-ready. … So we have to get use-case right. And see if it can be solved with current feature-set. Only if needed, we should look further into this html issue. Ivan Herman: Manu said many things what I wanted to say. We need danbri to raise his voice. … We have to rely on documented requirements Rob Sanderson: I agree Benjamin Young: I think what you described, if it’s on danbri’s previous desire to have this in jsonld, then this is a request. Dan has expressed multiple times that search engines want to understand page contents. This is different to giving identifier that serves contexts in html. … We are going to end up with RDF dataset that is compiled of multiple contexts. Gregg Kellogg: no, doesn’t work that way Benjamin Young: Generating a graph is not about coupling jsonld context identifier algorithm. Gregg Kellogg: I don’t think it is practical for many CMSs to do content negotiation (like github pages) … we have to re-characterize what jsonld in html is. … I agree that these things start to increase complexity and raising barrier. … We need to reconsider what processing jsonld in html means. Manu Sporny: +1 to re-characterize how processors process JSON-LD in HTML. Rob Sanderson: We are not going to solve this today. … I will reach out to danbri to see if he wants to engage. Gregg Kellogg: He may be at WebConf |
From @danbri, posted with permission, after discussion with @gkellogg:
|
👍 Related to this, that best practise note should also talk about caching of contexts.
One possible solution for this would be to allow a link header to be added to HTML documents that points towards contexts. |
I think that this would be a good thing to do. Provide guidance on aggressively caching the schema.org context (or packaging it with software implementations). |
Then state that the new schema.org context will be served from: "https://schema.org/v1" -- make that the context, say that "https://schema.org/" is an alias for "https://schema.org/v1" and note that you will turn off content negotiation for "https://schema.org/" at the beginning of 2020. |
Why? Seems like extra complexity... just say that the new schema.org context file is at: https://schema.org/v1 and be done with it. The schema.org context is so large that implementations will ship with it or aggressively cache it. Speaking from our implementation experience, at one point a bug caused us to go out to the web and fetch schema.org for every digital signature we did and our dev environment suffered horribly - massive performance hit. We now ship with static copies of schema.org... we never go out to the network to get the massive context (and that is the way it should be). The only issue, of course, is there is no versioning for schema.org... but we haven't had an issue w/ that yet. We may have an issue when people start digitally signing schema.org content and expecting those signatures to stay valid for 3-5 years while schema.org shifts underneath them. |
Yes, correct, so we don't need the JSON-LD Context processing in HTML documents feature. No one is asking for that feature. |
I don't understand this statement. There are a number of us that are attempting to make JSON-LD work w/ pure JSON environments in a more harmonious way and have made great strides towards that with the help of JSON-LD 1.1's |
There sort of is...but it could be better. For instance, all the versions are in a directory on GitHub: The 3.7 context file (for instance) lives at https://github.com/schemaorg/schemaorg/blob/104238766458b465e6a60cc7d049f887c542563a/data/releases/3.7/schemaorgcontext.jsonld That's versioned--via git sha's--but not tagged in git (which would help) nor made available as "the 3.7 context file" from the release history page. All of that would help certainly. |
@azaroth42 it would be helpful (if possible) to see more of that thread, or to make this an actual conversation/call (again, if possible). Without it, it's not clear we're all talking about the same thing(s). |
@BigBlueHat this was from hallway conversations at the Web Conference, so no thread to refer to. @danbri should clarify his position, but IIRC, they could turn off content-negotiation for http(s)://schema.org and return a stub context in a script tag which references the actual JSON-LD version of the context, which could help their usage. So, for example, the schema.org web page might look something like the following: <!DOCTYPE html>
<html lang="en">
<head>
<!-- Generated from headtags.tpl -->
<meta charset="utf-8" >
<link rel="shortcut icon" type="image/png" href="docs/favicon.ico"/>
<link rel="stylesheet" type="text/css" href="docs/schemaorg.css" />
<link rel="stylesheet" type="text/css" href="docs/prettify.css" />
...
<script type="application/ld+json">{"@context": "https://schema.org/docs/jsonldcontext.jsonld"}</script>
...
</head>
</html> Presently, content-negotiation does a redirect to https://schema.org/docs/jsonldcontext.jsonld, so this would simplify their hosting infrastructure. |
Right, but it would vastly increase the amount of work a JSON-LD processor must do. Given this as a data document: {"@context": "https://schema.org/",
"@type": "Person",
"name": "me"} The processor (without a cached context it says is valid for
The processing requirements go from "use an HTTP(S) client" to "use an HTTP(s) client and HTML parser (which possibly supports JavaScript). |
There is a massive amount of json-ld embedded within html. Tools without
the capability to extract it are ignoring one of the biggest applications
of json-ld. So perhaps the burden is not quite so huge?
…On Fri, 14 Jun 2019 at 16:54, BigBlueHat ***@***.***> wrote:
Right, but it would vastly increase the amount of work a JSON-LD processor
must do.
Given this as a data document:
***@***.***": "https://schema.org/",
***@***.***": "Person",
"name": "me"}
The processor (without a cached context it says is valid for
https://schema.org/) would need to...
1. GET the default (HTML) response from https://schema.org/
2. Parse that looking for data blocks (i.e. <script
type="application/ld+json">)
1. with the added requirement that one of them says it's a context
file?
3. Extract that JSON-LD datablock
4. Parse it.
5. If valid, GET the @context value(s).
6. Parse those to create a single active context for the data document.
The processing requirements go from "use an HTTP(S) client" to "use an
HTTP(s) client and HTML parser (which possibly supports JavaScript).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#172?email_source=notifications&email_token=AABJSGKMBJVJIJIX5FIWJ2TP2O5J3A5CNFSM4HK3Y2R2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXXGNAY#issuecomment-502163075>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABJSGMMHEFSWFIOWEAFWI3P2O5J3ANCNFSM4HK3Y2RQ>
.
|
@danbri certainly if you're already in that space doing that thing, you're all set. 😃 But if you're in a "pure" JSON-LD environment (database, IoT, etc), you'd very much want to avoid having higher processing requirements. |
Tools really need to cache contexts, anyway, so this might serve as an added incentive to do so. |
This issue was discussed in a meeting.
View the transcriptBenjamin Young: See Syntax issue #172Benjamin Young: This issue is very related. Originally, extracting JSON-LD from HTML. This can now be done with a simple link header. … schema.org for example does not want to use conneg, so this is good for this. Proposed closing based on the last PRs. Gregg Kellogg: The behavior is slightly modified if you request context. Document loader will not add text/html from request. The API is not affected too much. … If you will deal with HTML, like schema.org, then you can achieve a compatibility level with processing JSON-LD in HTML, instead of doing it mid-processing. Dave Longley: Everything is untangled, and is cleaner now. Ivan Herman: Users should be warned that they don’t define context as part of an HTML file. Gregg Kellogg: We don’t have text saying that it can’t be done. We just removed the text saying that it can be done. Ivan Herman: Because it can be done in theory? Gregg Kellogg: Syntax doesn’t say anything about it. API doc explicitly excludes HTML. Proposed resolution: close #172 as addressed by #204 (Benjamin Young) Rob Sanderson: +1 Dave Longley: +1 Benjamin Young: +1 Ruben Taelman: +1 Ivan Herman: +1 Gregg Kellogg: +1 Resolution #3: close #172 as addressed by #204 |
From this issue in the Verifiable Claims Working Group with regard to the new "full Processor" conformance class: w3c/vc-data-model#585
@gkellogg wrote:
@msporny wrote:
@gkellogg wrote:
I agree that processing JSON-LD content in HTML is a primary use case and the WG should support it.
I disagree that people are publishing JSON-LD Contexts in HTML, that came out of nowhere. I can see what the WG is trying to do, but this issue is an example of my concern: w3c/vc-data-model#585
You have someone suggesting that we pull in a JSON-LD Context file via an HTML document without understanding the technical burden in doing so. They don't understand that publishing a JSON-LD Context as an HTML document will not require full processors.
I also note that expressing JSON-LD Contexts in HTML was not contemplated in any of the input documents to the JSON-LD WG and as such, the group is skirting very close to being in violation of their charter by adding this feature:
https://www.w3.org/2018/03/jsonld-wg-charter.html
https://github.com/json-ld/json-ld.org/wiki/Changes-in-Community-Group-Drafts-Targeted-for-1.1
https://json-ld.org/presentations/JSON-LD-Update-TPAC-2017/assets/player/KeynoteDHTMLPlayer.html
There are two major issues with this new set of features:
Making the following changes to the specification would be an improvement:
The text was updated successfully, but these errors were encountered: