Problematic/confusing algorithm text re: the handling of URL @context values #265

kasei · 2019-12-20T00:38:24Z

5.2.1) Initialize context to the result of resolving context against the base IRI of the document containing the local context which is established as specified in section 5.1 Establishing a Base URI of [RFC3986]. Only the basic algorithm in section 5.2 of [RFC3986] is used; neither Syntax-Based Normalization nor Scheme-Based Normalization are performed. Characters additionally allowed in IRI references are treated in the same way that unreserved characters are treated in URI references, per section 6.5 of [RFC3987].

This seems rather verbose, as "the document containing the local context" will already have a base IRI. However, there doesn't seem to be any mention of maintaining provenance of base IRIs of local contexts in this algorithm. I believe this is something that could be passed as an optional input variable (defaulting to the base IRI of the active context), but it would require cooperation at the call-sites of the context processing algorithm in the expansion algorithm and create term definition.

Steps 5.2.3 through 5.2.5 say:

5.2.3) If context was previously dereferenced, then the processor MUST NOT do a further dereference, and context is set to the previously established internal representation.
5.2.4) Otherwise, dereference context using the LoadDocumentCallback, passing context for url, and http://www.w3.org/ns/json-ld#context for profile and for requestProfile.
5.2.5) If context cannot be dereferenced, or cannot be transformed into the internal representation, a loading remote context failed error has been detected and processing is aborted. If the dereferenced document has no top-level map with an @context entry, an invalid remote context has been detected and processing is aborted; otherwise, set context to the value of that entry.

These steps are confusing for a few reasons:

5.2.3 talks about a stored version of the "established internal representation" of the dereferenced context IRI, but this seems to be hidden state rather than being passed as an input variable
There doesn't seem to be a reason to store the entire representation of dereferenced context documents instead of just the @context entry
The error checking portion of 5.2.5 (which is the bulk of the text) seems to only be applicable if the "otherwise" branch of 5.2.4 was taken. Perhaps it should be 5.2.4.1, leaving the setting of context in 5.2.4 (with or without extracting the @context entry, depending on the previous bullet point)

The text was updated successfully, but these errors were encountered:

gkellogg · 2019-12-20T01:42:47Z

Context Processing step 5.2.1 says:

5.2.1) Initialize context to the result of resolving context against the base IRI of the document containing the local context which is established as specified in section 5.1 Establishing a Base URI of [RFC3986]. Only the basic algorithm in section 5.2 of [RFC3986] is used; neither Syntax-Based Normalization nor Scheme-Based Normalization are performed. Characters additionally allowed in IRI references are treated in the same way that unreserved characters are treated in URI references, per section 6.5 of [RFC3987].

This seems rather verbose, as "the document containing the local context" will already have a base IRI. ...

The wording, while verbose, is consistent with how we have described IRI resolution since 1.0. What's changed is to make it clear that it is the base IRI of the document, rather than any @base which is used to do the resolution.

... However, there doesn't seem to be any mention of maintaining provenance of base IRIs of local contexts in this algorithm. I believe this is something that could be passed as an optional input variable (defaulting to the base IRI of the active context), but it would require cooperation at the call-sites of the context processing algorithm in the expansion algorithm and create term definition.

I don't think that it's necessary to burden the algorithm with more options for description the location of the document being processed, as it's available given that it was dereferenced. Implementations may, of course, use any internal mechanisms they need to remember this.

Steps 5.2.3 through 5.2.5 say:

5.2.3) If context was previously dereferenced, then the processor MUST NOT do a further dereference, and context is set to the previously established internal representation.
5.2.4) Otherwise, dereference context using the LoadDocumentCallback, passing context for url, and http://www.w3.org/ns/json-ld#context for profile and for requestProfile.
5.2.5) If context cannot be dereferenced, or cannot be transformed into the internal representation, a loading remote context failed error has been detected and processing is aborted. If the dereferenced document has no top-level map with an @context entry, an invalid remote context has been detected and processing is aborted; otherwise, set context to the value of that entry.

These steps are confusing for a few reasons:

5.2.3 talks about a stored version of the "established internal representation" of the dereferenced context IRI, but this seems to be hidden state rather than being passed as an input variable

It is implicit that implementations maintain such a memory, without being overly explicit on how they do it.

There doesn't seem to be a reason to store the entire representation of dereferenced context documents instead of just the @context entry

Sounds like a good optimization would be to only retain the @context part, do we need to be explicit about this?

The error checking portion of 5.2.5 (which is the bulk of the text) seems to only be applicable if the "otherwise" branch of 5.2.4 was taken. Perhaps it should be 5.2.4.1, leaving the setting of context in 5.2.4 (with or without extracting the @context entry, depending on the previous bullet point)

It could have been 5.2.4.1, but it doesn't change the control flow.

IMO, the text is clear as written and implementable without being overly prescriptive.

kasei · 2019-12-20T02:32:39Z

I don't think that it's necessary to burden the algorithm with more options for description the location of the document being processed, as it's available given that it was dereferenced. Implementations may, of course, use any internal mechanisms they need to remember this.

But the URL in question isn't the one that was just dereferenced, it's the one that may have been dereferenced to pass the local contexts into the context processing algorithm. That may end up being a dereferenced URL, but not in all cases I think, and may pass through several layers of indirection between the dereference and its eventual use in context processing.

Or have I misunderstood how it's being used? I thought I had it sorted as my current understanding led me to an implementation that does pass the test...

kasei · 2019-12-20T02:36:52Z

Sounds like a good optimization would be to only retain the @context part, do we need to be explicit about this?

It's not always clear in the text where data is going to be used (either elsewhere in the same algorithm or in other algorithms). This is especially true for implicit state data like this (and somewhat related to the discussion in #260, where at least you can visualize repeated uses of explicit variables in the EDs). Without explicit text, I'm not sure as an implementor that I'd be confident based on just the text in 5.2 that I could only store the @context value and throw away the rest of the document.

kasei · 2019-12-20T02:41:56Z

It could have been 5.2.4.1, but it doesn't change the control flow.

My issue with 5.2.5 is that what seems like the primary action of the step (setting context) is hidden behind a large amount of unrelated error-handling text that is only relevant if the control flow happened to pass through 5.2.4. I think it would be clearer to have context be set in its own stage, and keep error handling related to 5.2.4 code only within 5.2.4.

I can live with this as it is if there's resistance to changing it, but I reported an issue because the text as written tripped me up and only walking through tests and stepping through the algorithms with @azaroth42 got me to a point of understanding (a benefit independent implementors who come along after REC probably won't have!).

dlongley · 2019-12-23T22:37:42Z

@kasei,

Could you perhaps recommend some specific text that could go into an informative note such that it wouldn't cause a normative change to the spec but would offer the clarifications you're seeking? For example:

"Note: In order to prevent being overly prescriptive, this algorithm makes no statement about how internal state is kept for dereferenced context documents and broadly applies error handling checks. Provided that the output of the algorithm is unchanged, implementations may employ optimizations such as retaining only the @context part of the dereferenced document or restructuring steps to avoid error handling code that is superfluous under certain conditions."

kasei · 2019-12-24T17:53:40Z

@dlongley

I think your "retaining only the @context part " language might be reasonable.

I do not think the "broadly [applying] error handling checks" language is appropriate. The thing that needs to be checked in 5.2.5 does not even happen in some invocations of the algorithm. The previous invocation which passed through 5.2.4 carried out the error checking, but the current invocation which passes through 5.2.3 does not dereference or transform anything into an internal representation. I really do think this error checking language should be in 5.2.4, but given how recent updates to my other issues have been handled, I don't see why this would be considered a normative change.

Regarding the IRI resolution using an out-of-bound base IRI, however, I do think this needs more bookkeeping steps (as new input variables to Context Processing and Create Term Defintion, and in the related invocations in Context Processing steps 5.2.6 and 5.13, and in Create Term Defintion 23.3). I would have thought that these new variables/steps would represent a normative change, but since it seems the consensus is that changes that reinforce the expected output of the test suite are not normative, perhaps this is merely editorial.

Move step 5.2.5 to 5.2.4.1. For #265.

gkellogg · 2019-12-27T19:52:55Z

@kasei I added a note, based on @dlongley's suggested text, and a couple of others to provide some non-prescriptive guidance to implementers. It's true that this could have been handled by adding parameters to the algorithm, but would have had a broad impact, and is not the only way that it can be implemented, given that it's contained in the RemoteDocument which is being processed as a context.

Please let us know if this works, or suggest changes that would make it more acceptable.

kasei · 2019-12-30T17:37:06Z

The note about retaining @context, and the move of error handling in 5.2.4.1 look good. Thank you.

It's true that this could have been handled by adding parameters to the algorithm, but would have had a broad impact, and is not the only way that it can be implemented, given that it's contained in the RemoteDocument which is being processed as a context.

I'm still not convinced about this provenance tracking issue. Agreed that it's "not the only way that it can be implemented," but surely that's true of the entire document? My concern is that (to the best of my understanding) the provenance tracking here is required for code paths that handle both dereferenced content and non-dereferenced content. In some cases it's important to keep track of the provenance, and in others it's not. I don't think that's a good situation to be in while assuming that it's clear how such tracking can/should be done.

My belief is that these changes (some of these were mentioned above) are sufficient to perform the tracking:

addition of a key to track a base IRI in an expanded term definition
new input variable to Context Processing
pass the tracked base IRI to the recursive call in Context Processing 5.2.5
pass the tracked base IRI to Create Term Definition in Context Processing 5.13
new input variable to Create Term Defintion
pass the tracked base IRI in the call to Context Processing in Create Term Definition 23.3
set the tracked base IRI during term definition initialization in Create Term Definition 10
pass the tracked base IRI found in the term definition (if present) to Context Processing in Expansion 8

While these changes are certainly not trivial, I think they are relatively contained, and (assuming they are correct) provide explicit information on where provenance tracking must occur and where the resulting data is used.

gkellogg · 2019-12-30T19:45:11Z

We'll have to take it up on the call; I don't believe such changes are necessary.

Move step 5.2.5 to 5.2.4.1. For #265.

iherman · 2020-01-10T18:07:32Z

This issue was discussed in a meeting.

RESOLVED: Reject #265 as overreaching at the current stage of the process; changing algorithm signatures has too far-reaching consequences for a clarity change

View the transcript

Handling @context IRI wording
Rob Sanderson: link: #265
Rob Sanderson: the reason we’ve called this one out in particular
… is that the two Greg(g)’s have different views on what the exact wording should be
Gregg Kellogg: yes. we went back and forth a bit
… and what Greg was pushing for seemed like a very large rewrite
… and dlongley chimed it a bit also
… and that size change didn’t seem warranted
… the core question is how to resolve a relative context URL
… that text is certainly more vague
… the base of the document is used to resolve that
… but that may not be explicitly passed in
… so dlongley suggested we add a note that we were avoiding being overly prescriptive
… and a few more notes to clarify the intent and use throughout that section
… and without changing the API signature
… given where we are in the publishing process, I don’t think a heavier change is warranted
Rob Sanderson: it wouldn’t be a normative change would it be?
Gregg Kellogg: it depends on what you believe the boundary of the change is
… if we’re changing API signatures and data that’s being passed around
… then that would be a normative change
Pierre-Antoine Champin: and that we might introduce new bugs…
Gregg Kellogg: there are ways implementations can work around this without our changing the API
Rob Sanderson: any questions/
… k. I would suggest that we reject this issue on the grounds that we believe this would be a breaking change
… while it might be clearer to do it the way Greg suggests, the current way works without needing to change API signatures
Gregg Kellogg: no. it actually change the API signature, but the algorithm signature
… but nonetheless it would have far reaching API ramifications
Rob Sanderson: pchampin what are your thoughts on this one?
Pierre-Antoine Champin: I’m with Greg on this one
… with gkellogg on this one
… the potential for new bugs is quite high
Proposed resolution: Reject #265 as overreaching at the current stage of the process; changing algorithm signatures has too far-reaching consequences for a clarity change (Rob Sanderson)
Ivan Herman: +1
Gregg Kellogg: +1
Rob Sanderson: +1
Pierre-Antoine Champin: +1
Tim Cole: +1
Benjamin Young: +1
Jeff Mixter: +1
Resolution #2: Reject #265 as overreaching at the current stage of the process; changing algorithm signatures has too far-reaching consequences for a clarity change
Gregg Kellogg: I did mark the issue as rejected
… but I didn’t know if there was more to do to close this up
Rob Sanderson: we should call it out as an issue that was not accepted
… and point to this resolution as the rational for the rejection
… we’ve discussed it, considered it, and rejected the full extent of the change
… but we plan to add notes to clarify things to avoid future confusion
Ivan Herman: I checked with Ralph about what triggers a full CR or not
… and it’s largely under the control of the WG
… there’s no algorithm for when it should go back to the Director
… this one does seem to be on the borderline
… and it would restructure the algorithm heavily
… and all the other implementers would have to cross check things against the new changes
… if it’s a bug, then it would be justifiable
Rob Sanderson: it’s not wrong to implement it the way gkellogg has described it
Ivan Herman: no. of course not
… no one said they have to follow these algorithm lines specifically
… they just have to pass the tests
Gregg Kellogg: yes. we say that explicitly actually
Ivan Herman: and if they pass it without following all the algorithm steps, that’s fine
Rob Sanderson: any more on this one?

kasei · 2020-01-10T21:12:56Z

@gkellogg @iherman I have some questions regarding how this was handled. From today's minutes:

@gkellogg : there are ways implementations can work around this without our changing the API

Can you please explain how other ways of implementing this could happen without the approach I've described? To keep track of the provenance of downloaded contexts, mustn't the current algorithm implicitly be passing around this information as I've explicitly described?

As this has been rejected by the WG, can you at least confirm for me that the approach I've described will handle the tracking of context provenance as intended? Given that this happens implicitly in recursive code paths that only sometimes require this tracking, I am not entirely confident that what I've described will cover the intention of the spec. If the spec text can't be changed, I'd be much more comfortable if somebody from the WG could at least confirm that I've understood the intention correctly.

azaroth42 · 2020-01-10T21:32:56Z

For the second part, we did discuss that your approach is not only correct but even a better way to do it ... but that the extent of the change was too high, given that the algorithm is not wrong, just not as clear or transparent.

Explaining the current text I leave to @gkellogg :)

gkellogg · 2020-01-10T22:38:39Z

@kasei It's entirely reasonable for you to use the approach you've outlined. It's not necessary for implementations to precisely follow the algorithithms, and this is a case where developer initiative is necessary to meet the intention of the algorithm. There may be other state or options that an implementation may find useful for implementing the algorithms which aren't strictly spelled out.

My own implementation keeps a context_base accessor as part of a Context object, which set to the location the context was loaded from, and used to resolve subsequent loads. The original call to create a Context uses a base option, set to the location of the document which is loading the context.

The algorithm text makes it clear that relative IRIs for contexts are loaded relative to the location of the document containing the reference, so these mechanisms help to maintain that. Another implementation may do it differently.

It was felt that the changes to the algorithm signature, and all the places it is invoked could have a destabilizing effect, and weren't strictly necessary for implementations to be able to interpret the spec and pass tests.

azaroth42 · 2020-01-17T20:04:25Z

Re-opening this issue on the basis that we have now committed to re-entering CR, and thus such a change should be reconsidered in this new context.

iherman · 2020-01-24T18:13:07Z

This issue was discussed in a meeting.

RESOLVED: Add text to api to clarify the need for tracking of base IRIs during context processing

View the transcript

Confusing context URL handling,
link: #265
Rob Sanderson: We concluded that the change would be good, but was too big.
… Now that we dropped out of CR, we should discuss whether the change is useful.
… Should we do that now?
… gkellogg, will this be an editorial change that doesn’t need a resolution?
Gregg Kellogg: It will change the algorithm signature.
… We need a way to keep track of the resolved context URL to use when resolving future context URLs.
… It impacts the places that call it to pass a value around.
Ivan Herman: If you change the parameters, how many implementations will have to change?
Gregg Kellogg: We don’t change tests. Implementations do not have to follow the algorithm exactly.
Ivan Herman: Does anyone else next to kasei follow the spec exactly?
… Do we know if we are affecting anyone else?
Gregg Kellogg: This was behaviour that was expected in 1.0.
Ivan Herman: If you do this change, do the implementation of Ruben and Dave have to change.
Ruben Taelman: I don’t follow the algorithm.
Dave Longley: agree with gregg, i don’t think we need to change anything in JS
Proposed resolution: Add text to api to clarify the need for tracking of base IRIs during context processing (Rob Sanderson)
Ivan Herman: +1
Gregg Kellogg: +1
Rob Sanderson: +1
Ruben Taelman: +1
Dave Longley: +1
Benjamin Young: +1
Harold Solbrig: +1
Adam Soroka: +1
Resolution #5: Add text to api to clarify the need for tracking of base IRIs during context processing
Pierre-Antoine Champin: +1

…Term definition. For #265.

gkellogg · 2020-01-27T23:21:19Z

@kasei I mostly followed the steps you indicated, but there were many ripple affects. Please give it a look over and see if it seems right.

…Term definition. For #265.

gkellogg · 2020-02-10T02:24:32Z

@kasei still waiting to see if this solves the issue for you.

kasei · 2020-02-10T04:29:45Z

@gkellogg Apologies. #356 seems to handle this differently in some places than the way I implemented it, so it's taking me some time to sort out the effects. I'll try to get you an answer ASAP.

kasei · 2020-02-14T21:27:57Z

@gkellogg I am also going to create new issues representing problems in #356. Beyond these, the PR has broken some tests for me and I'm not yet clear if they represent actual problems; I'll keep you updated as I try to get back to passing the tests.

gkellogg added the wr:open label Dec 20, 2019

gkellogg added a commit that referenced this issue Dec 27, 2019

Add some notes to Context Processing algorithm for strings.

4003c43

Move step 5.2.5 to 5.2.4.1. For #265.

gkellogg mentioned this issue Dec 27, 2019

Add some notes to Context Processing algorithm for strings. #289

Merged

gkellogg added wr:pending wr:spec-updated-partial and removed wr:open labels Dec 27, 2019

gkellogg added a commit that referenced this issue Dec 30, 2019

Add some notes to Context Processing algorithm for strings.

d7a354f

Move step 5.2.5 to 5.2.4.1. For #265.

gkellogg added wr:commenter-rejected wr:spec-updated and removed wr:pending wr:spec-updated-partial labels Dec 30, 2019

gkellogg closed this as completed Jan 10, 2020

azaroth42 reopened this Jan 17, 2020

gkellogg removed wr:commenter-rejected wr:spec-updated labels Jan 24, 2020

gkellogg added spec:editorial spec:substantive labels Jan 24, 2020

gkellogg added a commit that referenced this issue Jan 27, 2020

Updates to pass base URL to Expansion, Context Processing and Create …

30cc8d4

…Term definition. For #265.

gkellogg mentioned this issue Jan 27, 2020

Updates to pass base URL #356

Merged

gkellogg added wr:pending wr:spec-updated-partial labels Jan 27, 2020

gkellogg added a commit that referenced this issue Jan 29, 2020

Updates to pass base URL to Expansion, Context Processing and Create …

e5dfd28

…Term definition. For #265.

gkellogg added wr:spec-updated and removed wr:spec-updated-partial labels Jan 29, 2020

gkellogg self-assigned this Feb 16, 2020

gkellogg added wr:commenter-no-response and removed wr:pending labels Mar 17, 2020

gkellogg closed this as completed Mar 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problematic/confusing algorithm text re: the handling of URL @context values #265

Problematic/confusing algorithm text re: the handling of URL @context values #265

kasei commented Dec 20, 2019

gkellogg commented Dec 20, 2019

kasei commented Dec 20, 2019

kasei commented Dec 20, 2019

kasei commented Dec 20, 2019

dlongley commented Dec 23, 2019

kasei commented Dec 24, 2019 •

edited

Loading

gkellogg commented Dec 27, 2019

kasei commented Dec 30, 2019

gkellogg commented Dec 30, 2019

iherman commented Jan 10, 2020

kasei commented Jan 10, 2020

azaroth42 commented Jan 10, 2020

gkellogg commented Jan 10, 2020

azaroth42 commented Jan 17, 2020

iherman commented Jan 24, 2020

gkellogg commented Jan 27, 2020

gkellogg commented Feb 10, 2020

kasei commented Feb 10, 2020 •

edited

Loading

kasei commented Feb 14, 2020

Problematic/confusing algorithm text re: the handling of URL @context values #265

Problematic/confusing algorithm text re: the handling of URL @context values #265

Comments

kasei commented Dec 20, 2019

gkellogg commented Dec 20, 2019

kasei commented Dec 20, 2019

kasei commented Dec 20, 2019

kasei commented Dec 20, 2019

dlongley commented Dec 23, 2019

kasei commented Dec 24, 2019 • edited Loading

gkellogg commented Dec 27, 2019

kasei commented Dec 30, 2019

gkellogg commented Dec 30, 2019

iherman commented Jan 10, 2020

kasei commented Jan 10, 2020

azaroth42 commented Jan 10, 2020

gkellogg commented Jan 10, 2020

azaroth42 commented Jan 17, 2020

iherman commented Jan 24, 2020

gkellogg commented Jan 27, 2020

gkellogg commented Feb 10, 2020

kasei commented Feb 10, 2020 • edited Loading

kasei commented Feb 14, 2020

kasei commented Dec 24, 2019 •

edited

Loading

kasei commented Feb 10, 2020 •

edited

Loading