Skip to content

Encapsulate HTML processing #135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 26, 2019
Merged

Encapsulate HTML processing #135

merged 6 commits into from
Aug 26, 2019

Conversation

gkellogg
Copy link
Member

@gkellogg gkellogg commented Aug 19, 2019

Move HTML bits into a separate section, with callouts into the documentLoader algorithm to contain HTML-related processing. Also, removes processor levels describing this as a processor supporting "HTML script extraction".

For w3c/json-ld-syntax#213.


Preview | Diff

@iherman
Copy link
Member

iherman commented Aug 20, 2019

In 9.5.1, after item (2), there is note referring to "Feature at risk".

That term has a very well defined meaning in the the W3C process: a feature that the WG considers as a "should have" in the final spec, but not sure whether that specific issue will pass the CR phase. By explicitly marking a feature to be at risk, the feature can be removed from the final spec if there aren't enough implementation, without the requirement to re-publish a new CR and delaying the Proposed Rec phase.

I do not think that is what we mean here. If it is indeed something else, we should find another term.

@BigBlueHat
Copy link
Member

a feature that the WG considers as a "should have" in the final spec, but not sure whether that specific issue will pass the CR phase. By explicitly marking a feature to be at risk, the feature can be removed from the final spec if there aren't enough implementation, without the requirement to re-publish a new CR and delaying the Proposed Rec phase.

I think we need to keep this marked as "at risk" for exactly these reasons. The syntax spec suggests avoiding using <base> due to it's dynamic nature. Current extractors (SEO bots and Structured Data Testing Tool) don't currently use <base> in relationship to data blocks and the SEO community generally recommend against relative URLs in general.

Given that this is, then, a new feature addition (beyond how JSON-LD in HTML currently works), and that supporting it may introduce additional requirements (DOM and/or JS processing) on JSON-LD API implementers, leaving it marked as "at risk" seems prudent until we know from implementers how they plan to handle this proposed feature.

@BigBlueHat
Copy link
Member

BigBlueHat commented Aug 20, 2019

@gkellogg I'd love to see a "data block" mention added with reference to the HTML spec which defines data blocks.

@iherman
Copy link
Member

iherman commented Aug 20, 2019

@BigBlueHat maybe. But the note in the document says:

(FEATURE AT RISK) ISSUE
The use of the Document Base URL from [HTML] for setting the base IRI of the enclosed JSON-LD is an experimental feature, which may be changed in a future version of this specification.

The issue is not whether it will be changed in a future version or not; the issue is whether it will be kept in the current version or not. I.e., the note's text is suggesting something different. We have to decide what we want to say here...

@BigBlueHat
Copy link
Member

@iherman maybe I'm reading that note differently...or don't understand the distinction between "current" and "future" versions.

If it's "at risk" when we go to CR and still lacks implementation, then we can take it out easily (without having to go through more process). If we remove the "at risk" label now, then it's a heavier lift (i.e. triggers more process) if we find it's unimplemented or causes other issues/concerns.

I'd prefer we keep the "at risk" status until we have more implementation evidence.

@iherman
Copy link
Member

iherman commented Aug 20, 2019

If we want to do what you want to do:-), then the text should say something like

(FEATURE AT RISK) ISSUE
The use of the Document Base URL from [HTML] for setting the base IRI of the enclosed JSON-LD is an experimental feature. If there are not enough implementation experience during the Candidate Recommendation phase, this feature may be removed from the final specification.

Of course, future version may do what they want, but the user should not count on this feature in this version of the specification.

The current text may be understood as saying: "There will be a standard feature in the present recommendation, but that feature may change in future revisions". This is different...

@BigBlueHat
Copy link
Member

@iherman that makes sense. Guess @gkellogg should share thoughts since he wrote the original text. 😃

@gkellogg
Copy link
Member Author

We’ll just leave it as an issue an remove the “atrisk” part. That said, we should figure out how to close out such open issues in the specs.

Copy link
Contributor

@pchampin pchampin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appart from a typo and a few remarks, good for me.

@gkellogg
Copy link
Member Author

gkellogg commented Aug 21, 2019

Regarding the (off-topic) atrisk section, in #137, I simply removed it. It is discussed on the syntax doc.

@gkellogg
Copy link
Member Author

@gkellogg I'd love to see a "data block" mention added with reference to the HTML spec which defines data blocks.

Would this go better in w3c/json-ld-syntax#214? Is it more an API or a Syntax concern? Maybe suggest some wording.

@iherman
Copy link
Member

iherman commented Aug 23, 2019

This issue was discussed in a meeting.

  • RESOLVED: gkellogg to merge #135 and #214 after reviewers have approved and close the relevant issues
View the transcript ncapsulate HTML processing
Rob Sanderson: See Syntax #214
Rob Sanderson: See API #135
Rob Sanderson: Discussion from last week has resulted in some PRs.
Ivan Herman: Gregg not here this week.
Pierre-Antoine Champin: dlongley: PRs are moving in a direction I would agree with
Rob Sanderson: I would agree as well, pushing things into the document loader as discussed last week.
… I guess the issue to discuss is – is there anyone who is not comfortable yet otherwise we should accept those PRs.
Pierre-Antoine Champin: scribeassist: pchampin
Rob Sanderson: Any objections to the approach?
Ivan Herman: I read through the documents and we have done the work.
Pierre-Antoine Champin: dlongley: I would like to wait for other reviews before minerging (including mine)
Proposed resolution: gkellogg to merge #135 and #214 after reviewers have approved (Rob Sanderson)
Proposed resolution: gkellogg to merge #135 and #214 after reviewers have approved and close the relevant issues (Rob Sanderson)
Dave Longley: +1
Ruben Taelman: +1
Rob Sanderson: +1
Benjamin Young: +1
Pierre-Antoine Champin: +1
Ivan Herman: +1
David I. Lehn: +1
Resolution #2: gkellogg to merge #135 and #214 after reviewers have approved and close the relevant issues

gkellogg and others added 6 commits August 26, 2019 11:12
@gkellogg gkellogg merged commit e861362 into master Aug 26, 2019
@gkellogg gkellogg deleted the partition-html branch August 26, 2019 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants