Skip to content

Add '@annotation' container type #196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
niklasl opened this issue Nov 12, 2012 · 17 comments
Closed

Add '@annotation' container type #196

niklasl opened this issue Nov 12, 2012 · 17 comments

Comments

@niklasl
Copy link
Member

niklasl commented Nov 12, 2012

Given the various needs outlined in issues #84, #133, #159 and #195, it seems there may be a general need for noisy JSON to work as JSON-LD. While not ideal, it may be required for zero-edits.

This is a proposal to add a keyword, tentatively called @annotation. It is only to be used in a context definition, and signals to the processor to skip a part of the JSON but continue recursive processing.

For example, it could be used to provide any kind of application-specific index-objects, like this:

{
  "@context": {
    "author": {"@id": "http://schema.org/author", "@container": "@annotation"}
  },
  "@id": "http://example.org/article",
  "author": {
    "regular": {"@id": "http://example.org/person/1"},
    "guest": {"@id": "http://example.org/guest/cd24f329aa"}
  }
}

The publisher has here decided that authors are to be accessed by some property (here some kind of role), which is not to be exposed as information (interpretable as RDF). To do this, the above shape has an injected artificial object between the author property and the authors, which is to be ignored. Thus, semantically, the above means exactly the same as:

{
  "@context": {
    "author": {"@id": "http://schema.org/author", "@container": "@annotation"}
  },
  "@id": "http://example.org/article",
  "author": [
    {"@id": "http://example.org/person/1"},
    {"@id": "http://example.org/guest/cd24f329aa"}
  ]
}

The role information itself could of course be included in the information about the author, or in associated account data. In fact, this mechanism enables publishers to experiment with many kinds of special container algorithms (such as the before suggested @id maps or generalized property maps for language, timestamps, etc.), which may in the future be part of JSON-LD. (At which time the context, and only that, could be updated with such newly supported container options.) And since objects-as-maps is a fairly common occurrence in JSON in the wild, this annotation mechanism may ease adoption in general.

The @annotation keyword could also be used as the @id for a term (i.e. again only in the context). If used so, the term itself would be ignored (and thus any linkage), but its object value could be processed, just as if it had been a top level object within a @graph array. This may be a separate proposal.

A downside of this proposal is that the shape of compact JSON-LD becomes harder to immediately understand, since "faux" keys may pop up in unexpected places. The upshot is that any context using the annotation keyword can immediately be categorized as being for application-specific, idiosyncratic JSON. This lets consumers know that such JSON cannot be automatically created using compaction by itself, but has been composed by some other process. (This somewhat akin to RDFa in that the JSON syntax can be treated as a carrier with parts picked out as semantically relevant. Also compare this to GRDDL, replacing XML with JSON and XSLT with just the JSON-LD context.)

If these annotations must survive expansion, an intermediate object with only an @annotation key reasonably have to be put into the expanded form. (Similar in shape to @list objects.) There should be a flag to control if annotations are to be preserved (default being false).

@msporny
Copy link
Member

msporny commented Nov 12, 2012

This proposal is related to 'Decide on language handling for JSON-LD': http://drupal.org/node/1838700

@msporny
Copy link
Member

msporny commented Nov 12, 2012

I really like this proposal. It's exactly the sort of compromise that JSON-LD should be making - there are some things (such as application-specific data structure optimization, that you don't want surfacing in your RDF). It's simple, directly addresses the Drupal use case, and allows applications to use their own application-specific annotation ("language-like maps", @id maps, etc.) without surfacing the annotation in the RDF. I spoke with @linclark about it and she thinks that it would work for Drupal's use case. Any objections from @gkellogg, @dlongley, @lanthaler, @cygri or @tidoust?

@dlongley
Copy link
Member

+1 to the proposal. We should also make it possible to specify (in the @context) where the deep data is added -- to cover the microdata use case.

@tidoust
Copy link

tidoust commented Nov 12, 2012

I like the idea as well.

I'm not sure I get @dlongley comment about depth, so my comment may be a duplicate of his. Would the proposal cover cases where someone comes up with a "container" that is more than one level deep (similar to a multi-column index in a database)?

{
  "@context": {
    "author": {"@id": "http://schema.org/author", "@container": "@annotation"}
  },
  "@id": "http://example.org/article",
  "author": {
    "chapter1": {
      "regular": { "@id": "http://example.org/person/1" },
      "guest": { "@id": "http://example.org/person/2" }
    },
    "chapter2": {
      "regular": { "@id": "http://example.org/person/1" },
      "guest": { "@id": "http://example.org/person/3" }
    }
  }
}

If it does, how? It not, could that be problem?

Side though related note: while re-writing the grammar, I've been somewhat surprised to realize that it was actually pretty strict. I was more expecting something à la GRDDL as Niklas puts it, i.e. the possibility to have properties more meant for internal use only that would be lost during processing, combined with properties properly flagged as Linked Data that would be preserved. I suppose this has been discussed in the past. Any pointer to relevant discussions or arguments?

@gkellogg
Copy link
Member

I like everything in the proposal but the last paragraph. Maintaining the annotation information through an @annotation property I think makes the information worse, not better. It has profound implications to other algorithms such as compact, flatten and frame, not to mention to/fromRDF.

I would rather see the annotations be removed from expansion, yielding a form similar to @niklasl's second example above. This also works best when trying to consume other JSON, such as microdata-JSON, Twitter and GitHub.

Would the proposal cover cases where someone comes up with a "container" that is more than one level deep (similar to a multi-column index in a database)?

Yes, I think that can work too. Basically, when encountering a property in the expansion algorithm in before step 2.2.2 and the las sentence of 2.2.1, add the following:

If _property_ has @container @annotation, expand this _value_ recursively using this algorithm, passing copies of the *active context* and *active property*.

Of course, this needs to consider both the case when it's the top-level property that's being consumed with properties left in the RHS, and then the LHS property is preserved with values promoted up to that property.

In the case of microdata-JSON, with a structure like the following:

{
  "type": "http:schema.org/Person",
  "properties": {
    "name": "Gregg Kellogg"
  }
}

We could have a context applied such as the following:

{ "@context": {
  "@vocab": "http://schema.org/",
  "type": "@type",
  "properties": {"@container": "annotation"}
}}

This would then expand to

[{
  "@type": "http://schema.org/Person",
  "http://schema.org/name": [{"@value": "Gregg Kellogg"}]
}]

If it's important to preserve such "annotation" properties, then I think they need to have meaning in the context of the Linked Data Graph. Perhaps the @container: @graph mechanism preserves this best.

@niklasl
Copy link
Member Author

niklasl commented Nov 13, 2012

Yes, I agree that the last paragraph doesn't paint a pretty picture at all. It occurred to me however, that we could use the same mechanism which works for @container: @language here instead. So given the original example, if it was expanded to:

{"@graph": [{
  "@id": "http://example.org/article",
  "http://schema.org/author": [
    {"@id": "http://example.org/person/1", "@annotation": "regular"},
    {"@id": "http://example.org/guest/cd24f329aa", "@annotation": "guest"}
  ]
}]}

then those annotation keys would be "out of the way". Still semantic noise, but they wouldn't distort the expanded form in any way. The upside is also that if the compaction mechanism treated such annotations just like it handles @language for literals (but for any kind of object), someone could actually generate or post-process an expanded form to add annotations (e.g. by picking from other values in the object, to get an "index value"). Combined with the example context it could produce the desired idiosyncratic mapping "for free".

I agree that this goes out of its way a bit, but given how #133 works (which we have resolved to do), it's at least an isomorphic design (and also isomorphic to the other mapping ideas, for id or generalized properties, that have come up).

@niklasl
Copy link
Member Author

niklasl commented Nov 13, 2012

As for microdata JSON, I would also like to make it work. Not so much for the sake of microdata in and of itself, but since I have also seen its shape in other cases, where an object represents what I'd like to call a "property group". Consider this JSON:

{
  "@id": "http://example.org/book",
  "publishing": {
    "publisher" {"@id": "http://example.org/org/1"},
    "author": {"@id": "http://example.org/person/1"}
  },
  "description": {
    "type": "paperback",
    "size": "110mm x 178mm",
    "pageCount": "204"
  }
}

The publishing and description keys here are "meaningless" in the sense that their role is to group a bunch of properties together by some shared characteristic (i.e. this is "presentational noise" pushed into the data, unfortunately a somewhat common JSON (and XML) pattern as well). Microdata does the same, only that it groups all "proper" properties together under properties.

However, as Gregg notes, this shape is unfortunately an "inverse" of the shape in this proposal. In the proposal example (and in the issues it attempts to solve), the term (LHS) represents a real property and its object keys (RHS) are the "void" annotations. In this microdata/"property group" case, what is needed is to ignore the term and "fold in" the object as if its keys where actually terms of the current object.

Perhaps @id: @annotation could be made to work like this. (Though if this is required, perhaps it's better to be explicit and define e.g. @id: @fold or @id: @group.) In any case, I don't think @container is the right vehicle for the microdata case, since the term is meaningless, so it is its @id we should treat specially. So I suspect this is a separate proposal (and perhaps not as pressing for 1.0 as the other cases are).

@lanthaler
Copy link
Member

RESOLVED: If '@container': '@annotation' is added to the JSON-LD Syntax, the feature MUST be round-trippable from .compact() to .expand() back to .compact()

RESOLVED: Add '@container': '@annotation' to the JSON-LD Syntax.

@lanthaler
Copy link
Member

What's the value space of @annotation in the body of a document? Is something like this allowed:

{
  "@context": {
    "author": "http://schema.org/author"
  },
  "@id": "http://example.org/article",
  "author": [
    {
      "@id": "http://example.org/person/1",
      "@annotation": "regular"
    },
    {
      "@id": "http://example.org/guest/cd24f329aa",
      "@annotation": {
        "role": "regular",
        "office": "XH13"
      }
    }
  ]
}

Or are just strings allowed? Even if this is allowed, such objects wouldn't be compacted by a annotation-container I guess. I do see some value in having something surviving expansion that doesn't map to an IRI.. but well, you could easily mint a (temporary) IRI if you need to.

@niklasl
Copy link
Member Author

niklasl commented Dec 4, 2012

I think the value space should be string only. The @annotation value in (expanded) objects is only used in compaction (to provide the key in the map for a term with @container: @annotation). If @annotation is used elsewhere I think it should be ignored.

@lanthaler
Copy link
Member

I agree but it shouldn’t be dropped in expansion (if you meant that by “ignored”) because that would break round-tripping.

@niklasl
Copy link
Member Author

niklasl commented Dec 4, 2012

True, it must be kept during expansion.

lanthaler added a commit that referenced this issue Dec 5, 2012
lanthaler added a commit that referenced this issue Dec 9, 2012
lanthaler added a commit that referenced this issue Dec 9, 2012
lanthaler added a commit that referenced this issue Dec 11, 2012
…SON object

See Gregg's changes in 8c546b9.

This addresses #133 and #196.
lanthaler added a commit that referenced this issue Dec 13, 2012
lanthaler added a commit that referenced this issue Dec 14, 2012
This addresses #185 as well as #203, #142 and #196.
msporny added a commit that referenced this issue Dec 17, 2012
@msporny
Copy link
Member

msporny commented Dec 17, 2012

I added the basic language to support data annotations in JSON-LD. Having written the text, I think we should rename "@annotation" to "@index", as that's actually what's going on here... the developer is stating that the JSON Object is being used as an index, and that processing should continue deeper into the tree. I think the word 'index' will resonate more with developers than 'annotation'.

PROPOSAL: Change the "@annotation" keyword to "@index".

@lanthaler
Copy link
Member

I’m -1 on this. Using annotations as indexes is just one use case. I could imagine to use them in quite different scenarios, e.g. to store debugging information.

@msporny
Copy link
Member

msporny commented Dec 22, 2012

Alright, good point, I withdraw my proposal.

@lanthaler - It looks like we have a number of algorithms that now include this feature, is the algorithm work done now? If so, we should close this issue.

@lanthaler
Copy link
Member

Same here.. the API spec has been updated (I still have to look at the RDF algos but they must ignore this data anyway) but the syntax spec still needs some minor tweaks.

lanthaler added a commit that referenced this issue Dec 27, 2012
lanthaler added a commit that referenced this issue Dec 28, 2012
lanthaler added a commit that referenced this issue Dec 28, 2012
@lanthaler
Copy link
Member

I've just updated the syntax spec and sent a notification to the mailing list. Unless I hear objections I will close this issue in 24 hours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants