Skip to content

Does HTML's <base> effect @context IRI resolution? #134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
BigBlueHat opened this issue Feb 20, 2019 · 6 comments
Closed

Does HTML's <base> effect @context IRI resolution? #134

BigBlueHat opened this issue Feb 20, 2019 · 6 comments

Comments

@BigBlueHat
Copy link
Member

Our current draft explains how HTML's <base> tag relates to the @base value in JSON-LD--see Section 7.1.

However, we haven't yet discussed how <base> might effect @context IRI resolution. For example:

<base href="http://cdn.example.com/">
<script type="application/ld+json">
{
  "@context": "context-file.jsonld",
  "@id": "demo-page",
  "@type": "WebPage",
  "name": "Demo"
}
</script>

Currently we only specify how

We do not yet explain how context-file.jsonld is resolved.

If the processing were done "in page" by a jsonld.js, then the base URL would effect the resolution of context-file.jsonld. However, it's currently not stated how that same HTML processed via Python, PHP, Go, etc, would "absolutize" that context URL.

Options and scenarios include:

  • using the <base> tag
  • using the document URL (if still known)
  • using a filesystem path (if processing of disk with no URLs?)
@gkellogg
Copy link
Member

@BigBlueHat said:

We do not yet explain how context-file.jsonld is resolved.

The API document is fairly clear on this. In 4.1.2 of the Context Processing Algorithm step 3.2.1 says:

Set context to the result of resolving value against the base IRI which is established as specified in section 5.1 Establishing a Base URI of [RFC3986]. Only the basic algorithm in section 5.2 of [RFC3986] is used; neither Syntax-Based Normalization nor Scheme-Based Normalization are performed. Characters additionally allowed in IRI references are treated in the same way that unreserved characters are treated in URI references, per section 6.5 of [RFC3987].

Here, the base IRI is that of the active context, which was either established using @base, or the document location, or an API option.

The only reasonable way to interpret the using of an HTML base would be in the same way, which is what RFC3987 dictates:

  • <relative-reference>
  • 5.1.1 Base URI embedded in the content (@base in a context)
  • 5.1.2 Base URI of the encapsulating entity (<base> in this case)
  • 5.1.3 URI used to retrieve the entity
  • 5.1.4 Default Base URI (application-dependent)

@BigBlueHat
Copy link
Member Author

@gkellogg makes sense. I'd love to see us point that out in the syntax document in some way.

I think given the optionally dynamic nature of both the <base> tag and the existence of the <script> content, we may also want to (more explicitly) state that the entire meaning of your JSON-LD content could change depending on the "state" of the DOM at the moment of processing (i.e. processing a document post curl is potentially drastically different than processing it post "DOM ready"...or even later).

Some of that is likely best practice / primer stuff (/cc @ajs6f), but it'd be great to avoid subjectivity here if possible...

@BigBlueHat
Copy link
Member Author

Seems @ajs6f filed an issue related to this in the API repo: w3c/json-ld-api#53

We should discuss these in tandem (or succession at least).

@iherman
Copy link
Member

iherman commented Feb 22, 2019

This issue was discussed in a meeting.

  • RESOLVED: Recommend using absolute URIs or @base within JSON-LD if relative URI resolution is important, and add warnings to the spec for ramifications of using a potentially dynamic DOM for resolution or discovery of JSON-LD blocks (staying in #134)
View the transcript 3.2. [syntax] Does HTML’s <base> effect @context IRI resolution?
Simon Steyskal: #134
Rob Sanderson: which we will talk about with TAG next week
Rob Sanderson: [explains example in issue]
Benjamin Young: the nuance here is related to the potential dynamic nature
… the URI spec already outlines that base would also be resolved
… as it’s HTML
Gregg Kellogg: I think we do need to find someone who’s more familiar with HTML
… esp. wrt. dynamic changes
… when I was going through this, I seem to recall that in 1.0 we discussed how to deal with a remote context which references another remote context, to what shall this context be relative to?
Ivan Herman: the interpretation of the json-ld content must be done on load
… before anything else is done
Pierre-Antoine Champin: I’m not sure that we can guarantee that nothing is done before on load
Rob Sanderson: wanted to highlight that this is likely to be a security issue
… should flag it as such
… e.g. if you could change a verifiable claim
Rob Sanderson: not sure how we could enforce the on load stuff
… or test it
Ivan Herman: I’m surprised that this wasn’t an issue anywhere else
Benjamin Young: https://w3c.github.io/json-ld-syntax/#example-120-using-the-document-base-url-to-establish-the-default-base-iri
Benjamin Young: https://html.spec.whatwg.org/multipage/infrastructure.html#dynamic-changes-to-base-urls
Benjamin Young: it actually was, see the links I posted
… embedding the json-ld might be done with JS
… search engine bots will wait until the page stops moving
… but if I curl the page, I’ll take whatever is in the document
… if both things are in play, I might not be able to tell what data is actually shared then
… pinning down when JSON-LD processing shall be done is the actual question here
Gregg Kellogg: many applications use HTML as a syntax rather than a processing model
… what if someone does depend on dynamic state changes
… were different timings make things undecidable
… if you look at pre-respec times for example
Benjamin Young: https://github.com/w3c/respec/wiki/doJsonLd
Dave Longley: we should probably assume that most pages add json-ld after it has loaded
… we should provide clear guidance
Rob Sanderson: does the signature take also all expanded information into account?
Dave Longley: yes, the sign. requires expanding and converting to canonicalized RDF
… if you don’t to this you don’t pass
Ivan Herman: RDFa has a very similar problem, potentially
… it doesn’t say a word when processing has to be done
… although I has the same issues
… we should provide appropriate warnings
Dave Longley: +1 to what ivan said
Ivan Herman: I don’t think we can do anything more than that
Benjamin Young: maybe not relying on the base at all?
Gregg Kellogg: no.. that would go against the RFC
Dave Longley: -1 to ignoring etc — i don’t think it will fly with anyone (+1 to Gregg)… could be wrong of course.
Ivan Herman: +1 to gregg
Tim Cole: I agree with gkellogg that we have to use base
Gregg Kellogg: +1 to timCole Images don’t reload when base changes.
Ivan Herman: +1 ti timCole
Dave Longley: i.e. sound like an idea is to “lock in base” on read
Benjamin Young: part of the reason they don’t change images is because they don’t expect things to change
Tim Cole: yes maybe we should adapt a similar approach
Dave Longley: I suspect we might want to see some use cases to understand expectations with Web Components and things of that nature to make the “right decision” here.
Rob Sanderson: what would we anticipate the big browsers would do if the json-ld changes
Ivan Herman: +1 to bigbluehat
Benjamin Young: we should be careful to be not prescriptive on when and how to run processing
… but as said, provide guidance/info on what would happen
Dave Longley: +1 … as long as people can understand what will happen when they do processing and can control when to do that processing (there is choice), i think we’re ok.
Rob Sanderson: +1
Benjamin Young: you have a lot of options
… some of it is best practices
… if you targeting curl then put it directly in the document
… (for example)
Pierre-Antoine Champin: +1 to ivan, @base keeps you safe IMO
Dave Longley: +1 … use absolute URIs or @base to get “stable” resolution
Rob Sanderson: [providing possible proposals]
Rob Sanderson: “if you don’t do this, this are the possible ramifications”
… we should explain the different scenarios and what would happen
Proposed resolution: Recommend using absolute URIs or @base within JSON-LD if relative URI resolution is important, and add warnings to the spec for ramifications of using a potentially dynamic DOM for resolution or discovery of JSON-LD blocks (Rob Sanderson)
Gregg Kellogg: +1
Ivan Herman: +1
Rob Sanderson: +1
Benjamin Young: +1
David Newbury: +1
Simon Steyskal: +1
Dave Longley: +1
Pierre-Antoine Champin: +1
Tim Cole: +1
David I. Lehn: +1
Rob Sanderson: this should be the resolution for issue 134
Resolution #3: Recommend using absolute URIs or @base within JSON-LD if relative URI resolution is important, and add warnings to the spec for ramifications of using a potentially dynamic DOM for resolution or discovery of JSON-LD blocks (staying in #134)

@iherman iherman closed this as completed Feb 22, 2019
@azaroth42
Copy link
Contributor

Not sure this should be closed. I think there's still editorial work to do to add the warnings? If that has been done, then feel free to re-close :)

@BigBlueHat
Copy link
Member Author

gkellogg added a commit to w3c/json-ld-api that referenced this issue Aug 21, 2019
This was resolved in w3c/json-ld-syntax#134 (comment) which is reflected in the syntax document.
gkellogg added a commit to w3c/json-ld-api that referenced this issue Aug 21, 2019
Remove atrisk div on the use of HTML base. This was resolved in w3c/json-ld-syntax#134 (comment) which is reflected in the syntax document.
gkellogg added a commit to w3c/json-ld-api that referenced this issue Aug 26, 2019
Remove atrisk div on the use of HTML base. This was resolved in w3c/json-ld-syntax#134 (comment) which is reflected in the syntax document.
@pchampin pchampin moved this to Errata in JSON-LD Management May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

5 participants
@BigBlueHat @gkellogg @iherman @azaroth42 and others