-
Notifications
You must be signed in to change notification settings - Fork 117
Pull out id
from credentialSubject
. Change credentialSubject
to claims
.
#1130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
|
-1, the property is not "for claims" (in the absence of a credential subject), but to express claims about a credential subject. When we speak of "claims" in the VCDM, we are not talking about a container or bucket of arbitrarily modeled data, but rather, claims are "subject - property - value" statements. The top-level subject is the credential subject. Each subject is expressed as a JSON object, identified by the |
I agree with Dave's comment. That said, I think it's important to acknowledge that implementers are getting confused. I think part of the issue is that VCDM is about a graph of information, but people are thinking about it in JSON terms. Are there other changes that we could do to clarify such confusions? |
@dlongley while what you said is factually correct I have a hard time understanding it and bet that the majority of the working group, and an even greater majority of the implementers don't understand it either. That's a problem. Further, any language about "subject - property - value" statements are non-normative so I cannot find this argument convincing. |
More on how things work today from an older issue: |
I agree that we should elaborate more.
Well, no informative explanation is going to be "normative" (a contradiction of terms). What's normative is the reference that says the base data model representation is JSON-LD compact form. You can read the JSON-LD spec for all the details I mentioned above in a normative way as a testable expression of the data model. However, I agree that we could do a better job explaining things -- but it is going to be non-normative / informative text by its very nature. |
I agree with the direction this issue takes us in. |
@selfissued — Would you please elaborate on the direction you see this issue taking us? |
I suspect those coming from the JSON world would favour the approach of @decentralgabe and those coming from the RDF world would favour the approach of @dlongley The proposed changes by @decentralgabe aim to reduce the complexity of the Verifiable Credentials data model by renaming credentialSubject to claims and elevating id within it to a top-level subject property. This indeed simplifies the model in the context of JSON representations and could lead to easier understanding and implementation. I understand and respect the concerns raised by @dlongley and others regarding the inherent graph nature of the data model and the potential loss of expressiveness with this simplification. However, I believe that a balance between simplicity and expressiveness is necessary to ensure wide adoption. I dont want to make a proposal here as there are already some, and one more may not help. But as an illustration there might be a middle ground with a term like 'verifies' could be more intuitive to those coming from the json world and rdf world, and is also contained in the name of the project. |
-1 because we've gone down this road before (see link). We have spent significant time in previous iterations of the WG discussing this topic: #1128 (comment) I'll also note that it's not clear what object the "claims" refer to, and if we use claims, it can only refer to one object (the subject, presumably). That is,
as can any arbitrary object (via extension point, like
So, it doesn't make any sense to have a single property called |
|
@msporny good point, I think No matter what, an Edit: I agree with a lot of the original arguments in #480 |
@decentralgabe You may be mistakenly thinking that VC's have just a single subject. VCs like a marriage license actually contains claims for at least four subjects (officiant, spouse1, spouse2, and witness).
Where does the spec "allow for" a If we had a subject property that was not in the JSON-LD manner, sure, you could have an array of ids for those parties, but then you would still need to associate the specific claims with each of those subjects. In short, you'd still need a way to represent the triples that say "subjectX predicateA objectB" for each of those subjects. These are the claims about SubjectX where each VC may have multiple subjects. The credentialSubject property is that array that links statements about subjects to identifiers for those subjects. Having a separate property that just has subject identifiers is unnecessary and redundant with the To @dlongley's point, this is normatively defined today:
Having better language explaining this would be an improvement. However, getting rid of this JSON-LD pattern of representing statements about subjects, would, I think, misalign the data model relative to the semantic and profoundly break the fundamental data model that VCs are based on. -1 to this adjustment. Also, I doubt we'll get consensus on this change. |
No, in fact in my initial post I stated the contrary and gave an example on how you would address multiple subjects. It appears that I would need to amend my post to represent the correlation between the {
"@context": [
"https://www.w3.org/ns/credentials/v2",
"https://www.w3.org/ns/credentials/examples/v2"
],
"id": "http://example.edu/credentials/3732",
"type": ["VerifiableCredential", "NameCredential"],
"issuer": "https://example.edu/issuers/565049",
"validFrom": "2010-01-01T00:00:00Z",
"subject": ["did:example:ebfeb1f712ebc6f1c276e12ec21", "did:example:6f1c276e12ec21ebfeb1f712ebc"]
"subjectClaims": [
{ "subjectId": "did:example:ebfeb1f712ebc6f1c276e12ec21", "name": "Alice" },
{ "subjectId": "did:example:6f1c276e12ec21ebfeb1f712ebc", "name": "Bob" }
]
}
This was not clear. We allow for a
Are you and @dlongley asserting that what I proposed is impossible to represented in JSON-LD? That it is impossible to have separate |
No, I am not making that assertion. I also find what you proposed to be bizarre in plain JSON. If I had to model a car in JSON, I'd probably do it something like this: {
"id": "some VIN",
"type": "Car",
"color": "red",
"engine": {
"id": "some serial number",
"type": "InternalCombustionEngine",
"cylinders": 8
},
...
} I would NOT do this: {
"subject": ["some VIN", "some serial number"],
"claimsBucket": [{
"type": "Car",
"color": "red",
"engine": "???"
}, {
"type": "InternalCombustionEngine",
"cylinders": 8
}]
} And then expect people to map the different positions in "subject" to the positions in "claimsBucket" to understand where the IDs applied. I would actually find that to be non-idiomatic JSON and quite frustrating. Notably, JSON-LD was designed to serve idiomatic JSON by layering linked data on top of it with a goal of getting as close as possible to "zero edits / changes". It's true that sometimes people create JSON that is more or less "whatever" -- but that doesn't make for a consistent nor compositional data model. It requires everything to be understood in some bespoke way. Instead, it's better to represent your objects as ... JSON objects ... and your object's properties as properties of that object (JSON keys) and your object's properties values as ... the values of those properties. Then nest away based on properties that link to other objects as values. This is all quite natural modeling, IMO. So I wouldn't endorse your suggestion as a way to do plain JSON. It assumes a very simplistic, non-compositional model with a lot of external (not internally present) information to understand it (like the mapping of subject positions to positions in a big claims bucket). |
Notably this change from your original suggestion: {
"@context": [
"https://www.w3.org/ns/credentials/v2",
"https://www.w3.org/ns/credentials/examples/v2"
],
"id": "http://example.edu/credentials/3732",
"type": ["VerifiableCredential", "NameCredential"],
"issuer": "https://example.edu/issuers/565049",
"validFrom": "2010-01-01T00:00:00Z",
"subject": ["did:example:ebfeb1f712ebc6f1c276e12ec21", "did:example:6f1c276e12ec21ebfeb1f712ebc"]
"subjectClaims": [
{ "subjectId": "did:example:ebfeb1f712ebc6f1c276e12ec21", "name": "Alice" },
{ "subjectId": "did:example:6f1c276e12ec21ebfeb1f712ebc", "name": "Bob" }
]
} Looks just like what we have today, except there's a special "subjectId" property instead of "id" (which is what is consistently used for all IDs for any object in our model today) and the name "subjectClaims" instead of "credentialSubject". Then there's the extra "subject" property, which seems redundant. In short, I think what we have today is simpler and achieves the same goal with more consistency. |
@decentralgabe wrote:
hmm, let me fix that for you by renaming
Once you do that, |
@msporny no, you miss the point that having I'd assert that 90% of the time (if not more) a credential is about a single subject. The data model today makes that fact confusing, and doesn't require the identification of that subject. @dlongley we could debate what idiomatic JSON looks like. JWTs have top level subject identifiers and separate claims and seem to work just fine. In fact they have much broader adoption than VCs. Because a VC must have a It is quite clear in the diagram at the beginning of this section that there is a set of credential metadata, and then claims. To not call the claims claims is confusing. Credential metadata includes other statements -- like issuer, evidence, status, etc. Additionally, @dlongley, going back to your earlier comment...the spec clearly states that a claim is a statement about a subject. It would follow that having a section Recapping: we've contorted the data model in a terribly confusing way that is self-contradictory, given the sections I linked above. Let's reduce this confusion by calling subjects subjects and claims claims. |
I recommend just using a JWT if it works for you. There's no point in duplicating that standard here.
For different use cases. One reason VCs were invented years back was because JWTs on their own did not come with features to allow people to easily express open world data in consistent, expressible ways with extensible decentralized semantics. The standard that allows people to do that is JSON-LD, so VCs are built on top of it. JWTs did not have uptake in the space VCs were created to fill. We would have just used JWTs if they had the kind of data modeling and features that it seems you're now suggesting we remove in favor of the constrained and simplistic JWT approach. The JWT approach works for the set of use cases JWTs were designed for: simple authorization and authentication tokens. The vast majority of JWT use cases look practically the same, using a very limited, but very reusable set of JWT claims. If that's all you need, use JWTs. But it doesn't make any sense to make VCs behave just like JWTs when JWTs are already a standard.
There's more than one subject in every VC. The credential itself is a subject. The credential subject is a subject. The issuer is a subject -- and so on. The graph of information is a collection of statements that are "subject - property - value", where the "subject" is whatever the properties and values apply to. This is why we use the term "credentialSubject" to refer specifically to the credential subject and the claims made about it -- and to distinguish it from other subjects in the graph. We don't just use the generic term "subject" for this because then it's that much easier to confuse the two (generic "subject" with "credential subject") ... like it seems you just did. So for a VC: {
"@context": "...",
"id": "this is the ID of subject A",
"type": ["VerifiableCredential", "..."],
"issuer": {
"id": "this is the ID of subject B",
"name": "Some Issuer"
},
"credentialSubject": {
"id": "this is the ID of subject C, *the credential subject*",
"aPropertyOfTheCredentialSubject": "foo"
}
} This can be expressed as a set of statements (subject - property - value) that form a graph of information:
It's not contorted nor self-contradictory, there's just a misunderstanding. I believe the confusion here may actually be coming from removing the qualifier "credential" from "credentialSubject" leaving only "subject" behind ... with no way to differentiate it from every other subject in the graph of information. Another source of confusion could be from people in our group contextually and colloquially using "the subject" to mean "the credential subject". But the spec talks about more than just "the credential subject", it talks about "subject" as a generic "thing" in a subject-property-value statement (aka "claim"). I agree it would be good to see if there's some more informative text that would help alleviate confusion here. |
If that's the case, then you're saying that the identifier for the subject is optional, which is exactly what we have in the spec today. The only difference then becomes that the identifier of the subject of the credential is separated from the claims for the subject of the credential. Separating an identifier from the data that it's associated with does not seem like an improvement.
Then you're talking about keeping the ordering of two array values in an object in sync, which seems less than ideal. JSON-LD (and this goes down to the RDF model) does not maintain order in a *set*, which is true for any set-based data structure -- ordering is not maintained. That's just the pure mathematical definition of a set. JSON-LD (again, really the RDF data model) also has the concept of a list, which does preserve ordering. So, LD can do both unordered sets and ordered lists.
No, 100% of the time, a verifiable credential contains information about multiple subjects. These include at least: the issuer, the credential itself, and the It seems like when you say "subject", you mean "credentialSubject"... and not the more general "subject" in the "subject-property-value" sense.
Yes, but which subject?
I hope it's clear that by doing that, it confuses things further and doesn't simplify anything. |
While it's accurate to state that pure mathematical sets do not maintain order, it's not entirely accurate to say that JSON-LD doesn't maintain order in a "set". This assertion seems to conflate the concept of a "set" in mathematical terms with the concept of a "set" in the context of programming languages and data structures. In JSON-LD, an unordered collection of items is typically represented as an array. JSON, the underlying data format for JSON-LD, maintains the order of elements in an array. However, when JSON-LD is converted to RDF, which is a graph-based data model, that order is typically lost because RDF does not inherently support ordered collections. To preserve order, RDF provides a specific construct, the RDF List, but this is not commonly used due to its complexity. Therefore, while JSON-LD can represent both ordered and unordered collections, it is not accurate to say that it doesn't maintain order in a "set". The truth of this statement largely depends on the context: it's true in the context of RDF, but not in the context of JSON. ie only when converted to RDF is the ordering lost. Edit: quick example of how json-ld and RDF differ as a set: the array : // legal in json(-ld) {
"@context": {
"@vocab": "http://example.org/"
},
"numbers": [1, 2, 2, 3]
} the array : // illegal in RDF, becomes new array, 2 is missing, no order preserved {
"@context": {
"@vocab": "http://example.org/"
},
"numbers": [2, 1, 3]
} |
@msporny just commenting on your status list example, this JSON looks nicer, it saves space, and I think the RDF representation is also cleaner: "status": {
"id": "https://university.example/credentials/status/3#94567",
"type": "StatusList2021Entry",
"purpose": "revocation",
"index": "94567",
"credential": "https://university.example/credentials/status/3"
} Note that "purpose", "index" and "credential" are already in the context of the type "StatusList2021Entry"... so repeating the string "status" is wasteful in both JSON and RDF. As is repeating the word "credential" in "VerifiableCredential". |
For some history on why some terms have been prefixed: It's important that terms be In short, I'm pretty sure the above issue with taking the approach you're suggesting has been mitigated now -- but other issues may remain. |
First, many thanks to @dlongley and @msporny I had a gap in understanding you've helped me overcome I appreciate your clear explanations and patience. I understand why in JSON-LD land it makes sense to have a nested property for ID. I'm still not clear on why the More broadly, I am worried that few members in the group share the understandings you've conveyed which adds some significant risk to the group in developing and implementing the spec. I'm not sure the best way to overcome this, and it's clearly out of scope of this issue, but I feel like it's something we should address... Foremost, with my newfound understanding of the data model, I understand why the The concern over whether I'd like to revisit select comments from an issue raised by @RieksJ a few years back, #408 (and before that #207). The issue seemed to mostly not go anywhere because v1 was in CR at the time. Now that we're working on v2, before CR, this is the right time to address the issue, should we be able to find consensus. There are some strong articulations of my intentions which I'd like to recap. A few selected highlights: NOTE: These comments are years old and it's very possible the author's positions have changed.
+1 and this is the issue. The spec takes a weak position (by necessity) on being a fully JSON-LD specification. Because of this we're left in a no mans land of LD/JSON that leads people (like me) to be confused about what the data model is actually defining and why. In a sense, I would rather see a completely LD data model to reduce this ambiguity. |
@decentralgabe |
@David-Chadwick does that imply that with an |
the |
@decentralgabe wrote:
Yes, so would I. That said, you've heard the individuals in the WG that have consistently opposed such a model. To be fair, JSON-LD is more difficult to use than JSON, and this is because it makes an attempt at consistent data modelling (such as, "How do you globally identify an object?" or "Is the data model a tree (no cycles) or a graph (cycles allowed)?" or "How do you disambiguate terms?", and so on). It does this while trying to enable developers to only buy into as much of JSON-LD as they need. It's this latter property that enables JSON-LD to be embedded in Web pages for schema.org (no JSON-LD processing, or really, knowledge of how JSON-LD works to be known) or be embedded in a VC-JWT (again, no JSON-LD processing needed for those copy-pasting code into a template and then signing it). What's been happening more recently is that some folks in the WG want to understand the JSON-LD underpinnings more deeply (and want to make sure that others understand them more deeply), or we've got some developer rough edges around how people are using JSON-LD in VCs, and that has led to conflict (which is expected). The usage of JSON-LD in the core data model has always been a balancing act... use just enough of it to be useful to all communities, explain just enough of it to be useful to implementers, but always try to avoid going to an extreme and requiring that authors and implementers understand everything there is to know about JSON-LD before writing their first VC (that would clearly be a failure). To put it in perspective, none of us understand the depths of how the V8 engine works, nor spend considerable time in ECMAScript Working Groups, but still rely on the JavaScript language to get work done on the Web. Developers use tail recursion, promises, null coalescing, and cryptographic libraries without understanding how they're actually implemented under the scenes. Only a very few people on the planet need to understand how that stuff is implemented at depth. So, IMHO, one of our jobs in the WG is to expose a set of primitives to developers and implementers that are useful and easy to reason about without exposing them to all the gory details of the underlying technologies. We want to drive copy-paste behavior that "just works" instead of having to learn a PhD's worth of CS to use the standards we've created. That doesn't mean we won't have rough edges when we're done... we're just trying to smooth down as many of the rough edges together as possible. ... and that's why I don't think that going to an extreme on RDF and JSON-LD is going to help us either. People just don't have the time to learn those technologies at depth, and JSON-LD was created to cater to a large subset of developers that use JSON, but need some of the decentralized data modelling properties that JSON-LD brings to the table. |
And I would say that many need it without needing to realize it (IMO, that's a good thing) -- to make everything hang together in the decentralized three party model. The trick here is finding the right balance of what to expose to most people. It's a question of what they need to understand to make use of the standards, not what they could find out if they follow all the links down the various rabbit holes. Individuals can always do that on their own. We could spend lots of time trying to more accessibly surface what's in those rabbit holes in our own specs but with little positive impact on most people (and perhaps the opposite) by doing it. |
Thanks @msporny and @dlongley I agree with your responses. If there were a "vision" document or similar for the VCDM I think what you wrote would be immensely helpful to add there. My concerns have been alleviated hold for the confusion of the name "credentialSubject" -- I still think there's room for more clarity. |
There is an education gap. @msporny wrote a beautiful piece on his blog about this years (about 10) ago. Explaining the differences. I've spent quite a bit of time looking for it but didnt find it. It's somewhere in the archive of the internet. One of the main points was that arrays are not first class in linked data. And they are commonly used in JS/JSON. This ties into the idea of trees vs graphs, and unordered vs ordered data. It's subtle how much we rely on ordering even when it's not mandated. Silly example but: imagine that a file like I'd love to reread that blog post. I think there could be ways to close the information gap. |
Maybe this two-parter? http://manu.sporny.org/2014/json-ld-origins/
IMHO, it's a tooling gap more than an education gap. Though, education always helps... it educates more people so that more people can develop more tooling. To put this in perspective, I don't know how JSON Schema works, but I've had to write a JSON Schema parser that would build an internal model and render it to HTML. It was (and continues to be) an absolutely awful and horrible experience due to the language design... yet, people continue to use it (and some even love JSON Schema). Why is that? Well... there's enough tooling to help make it usable by people that have next to no idea about how the technology actually works (and that's a great thing). We suffer from tooling problems at the cutting edge... it's always been that way for new technologies (or new uses of old technologies)... and there is no amount of writing about how something works that solves that tooling problem. You have to develop and release the tooling to improve things. So, as much as I am a fan of educating people... that's not where most of the effort is needed these days -- it's in software libraries that implement the standards, and tooling that makes use of those software libraries to make developers more productive (until the AIs eat the developers, that is :P). Just my $0.02, which could be varying degrees of wrong. :) |
Be careful with such assertions. Email addresses may be globally unambiguous, but only with a timestamp, as domains may change hands, just as email addresses within domains may. Even with that temporal specificity, an email address may reach a single entity's mbox, or it may reach a group's shared mbox, or it may distribute to a group of mboxes (with a count ranging from zero to n) and thereby their individual or group owner entities, etc. In other words, email addresses are NOT globally unambiguous IDs. |
Doesn't this temporal aspect apply to all identifiers including all URIs? |
Yes. Temporal details should be included as attributes of every graph (whether named or not) wherein some URI is minted, used, and/or referenced. That said, the HTTP/S URI scheme is defined differently than the MAILTO URI scheme, and the HTTP/S URI scheme definition includes that there be a singular referent denoted by a given URI, even if that referent is defined to be a collection/group, and if you are abiding by principles of Linked Data and/or RDF, dereferencing that URI will yield a description which pinpoints the members of that group — or, sometimes sufficiently, at least informs that the referent is a group, though its specific membership may not be available. MAILTO URIs have no such general dereferenceability, though specific deployments may impose some means by which to glean such a list of members. My objection to the assertion that email addresses are also globally unambiguous IDs stands. |
Closing due to lack of consensus/interest. |
The current
credentialSubject
property is confusing for implementers and anyone inspecting a verifiable credential. It would be more meaningfully namedclaims
, which makes it abundantly clear that the section is for theclaims
being made in the credential.This issue is compounded with the current usage of the
id
property, which is an optional property within the currentcredentialSubject
. There are two sources confusion here:credentialSubject
property to learn whom the subject values are about (if they're about no one why are they there?)id
property is a special property with special processing rules which is identifying the party who the credential subject is about <--- this should be confusing enough for you to agree to rename the property.So, I propose two changes:
Before
After
By elevating
subject
to a top-level property we follow the existing pattern used byissuer
. We remove ambiguity about whom the credential is about. By renamingcredentialSubject
toclaims
we give a meaningful name to the property and remove confusion about its usage, including removing the need to parse its value for any special-case properties. Implementers will rejoice.Caveats
The only thing that this breaks, as far as I am aware, is multiple subjects. I believe this can be handled like so...
The text was updated successfully, but these errors were encountered: