Add '@graph' container type #195

lanthaler · 2012-11-09T10:31:34Z

_Sent to the mailing list by @linclark:_

Rather than continuing to reiterate the use case we have for language maps (which has been called a bogus use case and an anti-pattern by members of the WG), I thought it could be worth looking at another option.

What Drupal needs isn't really language management. Drupal needs version management, where the versions just happen to be based on language. That's why I originally considered named graphs. The idea of using named graphs for our use case made members of the WG balk and we were encouraged to look to language maps.

However, it seems now that language maps need to round trip to RDF. This means that language maps will force a change in the data model... for example inserting blank nodes in between a subject and its properties. I'm unclear on why it is preferable to create blank nodes in a data model than it is to use a named graph. The named graph at least lets you keep the same base triple structure, and consumers can choose whether or not to pay attention to the 4th element of the quad. As I recall, on a telecon where I brought it up, Gregg said that named graphs shouldn't be used unless you needed to make statements about the graph itself. However, others such as Leigh Dodds have discussed using named graphs for versioning or providing context, and I'm not sure that it's such an unconventional idea.

I would be interested to hear what concerns the WG have with this sort of use of named graphs.

Besides being discouraged from using them by the WG, the other reason I decided against named graphs was because there was no good way to access properties in named graphs. Since JSON-LD's query API is still unspecified, direct access to properties using the tree structure needs to be easy for the end user developer.

Instead of continuing to try to shoehorn language maps into our use case (or vice versa), I'm wondering whether making named graphs easier to traverse would be a better option.

For example, I imagine something like:

{
    "@context": {
        "site": "http://ex.org/",
        "body": {
            "@id": "site:body",
            "@container": "@graph"
        },
        "en": "site:node/1/en",
        "de": "site:node/1/de"
    },
    "@id": "site: node/1",
    "body": {
        "en": [
            "Here is some body text for the article."
        ],
        "de": [
            "Hier sind einige Textkörper für den Artikel."
        ],
    }
}

It would normalize to:

<site:node/1> <site:body> "Here is some body text for the article." <site:node/1/en>
<site:node/1> <site:body> "Hier sind einige Textkörper für den Artikel." <site:node/1/de>

And values could be accessed the same way as was intended with language maps:
obj.body.en[0]

I imagine this could be useful for expressing version information beyond language (for example, revisioning), which I could see being a large use case for many other CMSs besides Drupal.

-Lin

The text was updated successfully, but these errors were encountered:

lanthaler · 2012-11-09T10:35:04Z

_@gkellogg's response:_

On Nov 8, 2012, at 6:17 PM, Lin Clark [email protected] wrote:

Rather than continuing to reiterate the use case we have for language maps (which has been called a bogus use case and an anti-pattern by members of the WG), I thought it could be worth looking at another option.

What Drupal needs isn't really language management. Drupal needs version management, where the versions just happen to be based on language. That's why I originally considered named graphs. The idea of using named graphs for our use case made members of the WG balk and we were encouraged to look to language maps.

However, it seems now that language maps need to round trip to RDF. This means that language maps will force a change in the data model... for example inserting blank nodes in between a subject and its properties. I'm unclear on why it is preferable to create blank nodes in a data model than it is to use a named graph. The named graph at least lets you keep the same base triple structure, and consumers can choose whether or not to pay attention to the 4th element of the quad. As I recall, on a telecon where I brought it up, Gregg said that named graphs shouldn't be used unless you needed to make statements about the graph itself. However, others such as Leigh Dodds have discussed using named graphs for versioning or providing context, and I'm not sure that it's such an unconventional idea.

Hmm, I don't remember suggesting that named graphs should only be used when making assertions about a graph itself; do you have a reference? I could have done so, as the reason named graphs were brought in was particularly for the provenance use case, where you want to make assertions about other information.

In any case, we have come to the realization that JSON-LD is really a dataset model (like TriG) and not really a pure graph model (like Turtle). The only thing the RDF WG could agree upon is that datasets have no semantics, so we can infer that they don't in JSON-LD either. As you note many people use named graphs for all kinds of reasons, and I think (now anyway) that this might be a good solution for you.

I did see that back in July, we discussed @container: @graph as a potential solution for WikiData's solution, and if there's something we can do that addresses Drupal's use case, particularly if it does it better than language maps, then that seems like an interesting area to pursue.

I would be interested to hear what concerns the WG have with this sort of use of named graphs.

Besides being discouraged from using them by the WG, the other reason I decided against named graphs was because there was no good way to access properties in named graphs. Since JSON-LD's query API is still unspecified, direct access to properties using the tree structure needs to be easy for the end user developer.

Instead of continuing to try to shoehorn language maps into our use case (or vice versa), I'm wondering whether making named graphs easier to traverse would be a better option.

For example, I imagine something like:
{
    "@context": {
        "site": "http://ex.org/",
        "body": {
            "@id": "site:body",
            "@container": "@graph"
        },
        "en": "site:node/1/en",
        "de": "site:node/1/de"
    },
    "@id": "site: node/1",
    "body": {
        "en": [
            "Here is some body text for the article."
        ],
        "de": [
            "Hier sind einige Textkörper für den Artikel."
        ],
    }
}
It would normalize to:
<site:node/1> <site:body> "Here is some body text for the article." <site:node/1/en>
<site:node/1> <site:body> "Hier sind einige Textkörper für den Artikel." <site:node/1/de>

Yes, this looks right. It's certainly unusual, as the subject and property appear in the default graph, with the value(s) in the named graph, but it seems quite consistent. So, the semantics would be that the relevant subject and property are "pulled into" the named graph associated with their values, and any node definitions within that context would remain within the named graph.

Expanding such a structure (flattening, anyway) would likely look like the following:

[
  {
    "@id": "http://ex.org/node/1/en",
    "@graph": [{
      "@id": "http://ex.org/node/1",
      "http://ex.org/body": [
        {"@value": "Here is some body text for the article."}
      ]
    }]
  },
  {
    "@id": "http://ex.org/node/1/de",
    "@graph": [{
      "@id": "http://ex.org/node/1",
      "http://ex.org/body": [
        {"@value": "Hier sind einige Textkörper für den Artikel."}
      ]
    }]
  }
]

Figuring out how to reverse this when compacting might be challenging, but we haven't lost any information, so we should be able to do it.

Gregg

lanthaler · 2012-11-09T10:41:43Z

This is related to #133.

niklasl · 2012-11-09T18:10:52Z

So, to elaborate on my recent comment in issue 133:

Named graphs are for describing descriptions (they are "the sheet of paper the article is printed on"). It's a much more complex case for consumption than just describing the each language version as a distinct resource, in the data given by the canonical IRI for the resource (the one described by articles in different languages). That is just basic Dublin Core usage. Using named graphs is primarily for doing data quotation (used for e.g. digital signatures), handling provenance of entire datasets (i.e. datadumps of several quoted records) and managing quad stores (handling revisions etc). And handling datasets isn't something I'd expect e.g. CreateJS to do casually, for instance.

They are powerful and useful of course, but you may end up with disambiguation problems. If the same resource is described in two named graphs, it is still logically the same resource. For example, any use of a functional property (in OWL lingo) describing that resource pointing to different IRIs would mean that those two IRIs identify same thing. Conflation may abound if this is not thoroughly understood by authors of such data.

Is that fully OK by Drupal? And are the different versions really not viable to expose more concretely than as two sets of statements? You should compare this to recommended data handling in e.g. bibliographical systems (see e.g. FRBR). This is the pivotal point, especially for interoperability.

Do you also accept Gregg's example of the expanded data above? If so, the question is if this is a reasonable addition to the compaction algorithm. I can imagine how it would be done, but I'm not sure at what cost. It seems very advanced to support partitioning of each property value of a resource by named graph in a syntax like this. Let's hear what others have to say.

(Note that I'd still prefer to add a @container mechanism for mapping based on the property value of a member over this, as it is a more common case to describe different articles as distinct resources in the same graph.)

linclark · 2012-11-09T19:29:29Z

If the same resource is described in two named graphs, it is still logically the same resource... Is that fully OK by Drupal?

Right, we actually want it to be logically the same resource. They all have the same UUID in Drupal, we conceive of it as a single resource. We intentionally moved away from having "translation ids" in Drupal 7 to having a single entity with a single ID in Drupal 8.

Do you also accept Gregg's example of the expanded data above?

Yes. I can imagine how it will round trip, which is important and is something I could never quite be certain of in the language maps proposal.

niklasl · 2012-11-12T16:17:58Z

(This is a reply to a comment in 133, put here since it's mainly about the use of graphs to differentiate between language versions.)

@linclark My concern is that it seems like an odd way of partitioning information based on language. I use named graphs a lot for managing changes in descriptions from various sources. That article by Jeni is very good, and outlines a usable way of handling versions, specifically revisions, of data over time. Note though that for information resources, her recommendation is to use distinct representations of the resources (note especially the use of dct:hasVersion to link from a canonical "hub" resource to the different (here time-based) variants). Also for the use of named graphs (mainly pertaining to "real world entities", quite hard to talk about as snapshots over time), as the article describes, the default graph is intended to reflect the current state of affairs. How do you recommend to use the language-based entity variants for a node as named graphs, as exposed by Drupal, in RDF applications?

You do say "language-based entity variants". More than one language variant means that, conceptually, there are separate resources. (The representation of a resource is also a resource, with its own mime-type etc.) You cannot content-negotiate on language and get different resources back if they are intrinsically the same resource (what you get is a distinct representation in a specific language, its own comment count, author, etc.). Neither can you do a query against a graph to e.g. count the variants in english, etc, unless these are distinct.

Note that I'm thinking about this from the outside in (the surface data), not from the inside perspective (implementations often look rather different internally from the resources they expose, for many reasons). Also note that there is no hard requirement to publish the variants on different IRIs (certainly not when exposed as raw data). They can be subsumed as different entities without IRIs (i.e. blank nodes), described by the data published for the node (similar to the document "hub" in Jeni's example).

I just want to point out that this conflation may become problematic down the road, when information published by Drupal sites is syndicated and integrated by various applications. I'm no stranger to "practical conflation" (things can get absurd either way), but in this case it is evident that the difference in language is key (no pun intended..). So when publishing data containing this difference in syntax, it would be a waste to see it get lost in interpretation.

This is why I keep coming back to this; I'm sorry if I'm not conveying that clearly. I'm not after restructuring Drupal's internals, I'm just trying to focus on what I've seen regarding the usefulness of published information.

lanthaler · 2012-11-13T15:26:56Z

RESOLVED: Push the addition of '@container': '@graph' to the JSON-LD Syntax specification off to a later version of JSON-LD.

gkellogg · 2017-04-08T21:01:56Z

I propose taking this off of the 1.1 milestone; please 👍 or 👎 to favor/disfavor removing.

RubenVerborgh · 2017-04-12T09:41:18Z

I currently voted 👎 as I believe we need to do something to integrate graphs more seamlessly, but not sure whether this proposal is the right one. Alternative in #481.

msporny · 2017-04-12T13:16:47Z

Making @graph integrate more seamlessly into JSON-LD 1.1 is a fairly strong requirement for us wrt. Verifiable Claims. We would really like to have this feature soon-ish because many of the Verifaible Claims end up looking fairly awful w/o some way to integrate graphs w/o using the ugly @graph syntax. So -1 to kicking this particular can down the road. It's going to play a big part in how easy it will be to use Verifiable Claims for non-Linked Data developers.

gkellogg · 2017-04-12T14:04:04Z

Okay, can you suggest some wording?

msporny · 2017-04-12T16:11:42Z

@gkellogg I'll work with @dlongley on some wording. We're fairly slammed until May, we'll try to get to it after that (as well as an implementation).

gkellogg · 2017-04-12T16:14:38Z

It's on the project queue. I'll look at it myself, after getting to more of the back log.

dlongley · 2017-09-12T21:08:14Z

Update: I've added some experimental support for "@container": "@graph" and "@container": ["@graph", "@set"] here:

digitalbazaar/jsonld.js@f6a91c0

davidlehn · 2017-11-16T23:08:20Z

A branch and PR for @graph container support is here:
#549

gkellogg · 2017-12-07T19:13:31Z

Closed via #549.

niklasl mentioned this issue Nov 9, 2012

Add '@language' container type #133

Closed

niklasl mentioned this issue Nov 12, 2012

Add '@annotation' container type #196

Closed

gkellogg added the 1.1 label Sep 22, 2016

gkellogg removed the 1.1 label Oct 6, 2016

This was referenced Oct 19, 2016

Datasets in JSON-LD are horriblely ugly and not navigable #272

Closed

Interpreting keys in a @set as @ids or triple objects (RDF) #430

Closed

gkellogg mentioned this issue Jan 27, 2017

"Stratified" or "Dictionaried" API feature #460

Closed

RubenVerborgh mentioned this issue Apr 12, 2017

Allow the definition of named graphs without @ keywords #481

Closed

gkellogg mentioned this issue May 4, 2017

Named graph inside default graph #398

Closed

elf-pavlik mentioned this issue Oct 1, 2017

Adding already existing resources as collection members HydraCG/Specifications#134

Open

gkellogg mentioned this issue Dec 7, 2017

@graph container support #549

Merged

4 tasks

gkellogg closed this as completed Dec 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add '@graph' container type #195

Add '@graph' container type #195

lanthaler commented Nov 9, 2012

lanthaler commented Nov 9, 2012

lanthaler commented Nov 9, 2012

niklasl commented Nov 9, 2012

linclark commented Nov 9, 2012

niklasl commented Nov 12, 2012

lanthaler commented Nov 13, 2012

gkellogg commented Apr 8, 2017 •

edited

Loading

RubenVerborgh commented Apr 12, 2017

msporny commented Apr 12, 2017

gkellogg commented Apr 12, 2017

msporny commented Apr 12, 2017

gkellogg commented Apr 12, 2017

dlongley commented Sep 12, 2017

davidlehn commented Nov 16, 2017

gkellogg commented Dec 7, 2017

Add '@graph' container type #195

Add '@graph' container type #195

Comments

lanthaler commented Nov 9, 2012

lanthaler commented Nov 9, 2012

lanthaler commented Nov 9, 2012

niklasl commented Nov 9, 2012

linclark commented Nov 9, 2012

niklasl commented Nov 12, 2012

lanthaler commented Nov 13, 2012

gkellogg commented Apr 8, 2017 • edited Loading

RubenVerborgh commented Apr 12, 2017

msporny commented Apr 12, 2017

gkellogg commented Apr 12, 2017

msporny commented Apr 12, 2017

gkellogg commented Apr 12, 2017

dlongley commented Sep 12, 2017

davidlehn commented Nov 16, 2017

gkellogg commented Dec 7, 2017

gkellogg commented Apr 8, 2017 •

edited

Loading