"contentSchema" for embedded string data #669

handrews · 2018-11-07T16:53:12Z

JSON Schemas can be applied to any data that maps reasonably well into the JSON data model. We take advantage of this in JSON Hyper-Schema to apply JSON Schema to media types such as multipart/alternative and application/x-www-form-urlencoded.

contentMediaType and contentEncoding let us embed other media types into JSON strings. An example where this would be likely is with JSON Web Tokens, which have a media type of application/jwt.

The contents of a JWT are, as one might expect, JSON, but encoded in such a way that a +json media type would be inaccurate.

The contentSchema keyword would take a schema, which is applied to the contents of the string as interpreted according to contentMediaType and contentEncoding. This would be a new type of applicator, applying a schema to string contents.

There is an obvious implementation challenge, which is that implementations are not required to implement contentMediaType and contentEncoding as assertions, and may simply collect them as annotations for another application to use. A similar restriction would exist for contentSchema.

We may wish to suggest that a small number of media types with obvious roles in a JSON ecosystem, such as application/jwt, SHOULD be supported, but support for arbitrary media types and encodings obviously remains impossible.

This feature would be very useful in Hyper-Schema for JWTs and also for the contents of Cookie and Set-Cookie.

The text was updated successfully, but these errors were encountered:

gcallaghan · 2018-11-08T16:50:30Z

So with this can I describe a JWT with its base64 encoded sections with additional constraints? I find to make JWT's useful and secure I need to add additional constrainsts such as limiting the alg values in the header and requiring fields such as jti, iss, aud, and exp.

handrews · 2018-11-09T03:58:37Z

@gcallaghan after playing with that jwt.io site that you pointed me to (and thanks for your off-github help- I have a better grasp of JWT now), here is how I think it would be used in Hyper-Schema.

Using the example in RFC 7419 §3.1, assuming that we want to pin the header to require JWT and the HS256 algorithm shown in the example, and nothing more, but just want to describe the structure of the payload and leave it open-ended (because "all claims that are not understood by implementations MUST be ignored."), I think that would look like:

{
    "$schema": "https://json-schema.org/draft-08/hyper-schema#",
    "links": [
        {   
            "rel": "self",
            "href": "whatever",
            "headerSchema": {
                "type": "object",
                "required": ["authorization"],
                "properties": {
                    "authorization": {
                        "type": "object",
                        "required": ["Bearer"],
                        "maxProperties": 1,
                        "properties": {
                            "Bearer": {
                                "type": "string",
                                "contentMediaType": "application/jwt",
                                "contentSchema": {
                                    "type": "object",
                                    "required": ["header", "payload"],
                                    "properties": {
                                        "header": {
                                            "const": {
                                                "typ": "JWT",
                                                "alg": "HS256"
                                            }   
                                        },
                                        "payload": {
                                            "$ref": "http://example.com/schemas/rfc7519/registered-claims",
                                            "required": [
                                                "iss",
                                                "exp",
                                                "http://example.com/is_root"
                                            ],  
                                            "properties": {
                                                "iss": {"type": "string"},
                                                "exp": {"type": "integer"},
                                                "http://example.com/is_root": {"type": "boolean"}
                                            }
                                        }
                                    }
                                }   
                            }       
                        }       
                    }       
                }
            }   
        }
    ]
}

Some things to note:

I did not use contentEncoding because the base64url-encoding is part of the media type definition. contentEncoding is for when you need to decode the string before interpreting it according to the media type. With JWTs, the media type tells us enough to parse the whole thing into the JSON data model already.
For determining the general structure of HTTP header values, the Hyper-Schema spec references the draft specification for encoding headers in JSON. In this case, I treated "Bearer" as an object key because that's how the WWW-Authenticate example. Although with the Authorization header, the value is a string rather than an object.
I modeled the JWT contents as an object because each part has a clear and consistent name in the specification, and because @gcallaghan linked me to an example he wrote where it was done that way 😀 I could also see it being modeled as an array since the positions are fixed. Might be a good idea for JSON Schema to recommend a format since these things are so closely related to JSON. If not in the spec, then on the web site?
I'm assuming the existence of a referenceable schema for the pre-registered claims, so the only claim that needs to be described inline is the extension claim.

Does this seem reasonable? And useful?

gcallaghan · 2018-11-09T17:39:04Z

That looks good to me! I'm curious how would the array look? I used the object description as I wanted to be as disambiguous as possible. However, in practice I can see something like let [header, payload, sig] = someJWT.split('.')

awwright · 2018-11-09T21:40:54Z

It would seem to me the best way to map application/jwt onto JSON—perhaps with some participation from that WG—is to define a related (and isomorphic) application/jwt+json media type, the difference being it's more verbose and not suitable for use in User Agents, but valid JSON.

Similar approaches could be taken for other media types.

I forget where I saw it, but someone had a whole JSON vocabulary for representing XML content as JSON. PHP has a builtin library for converting XML to a (probably) compatible format: http://php.net/simplexml

handrews · 2018-11-10T01:58:19Z

@awwright one use case here is to describe APIs (or JWTs in general) as people use them, so that means working with application/jwt. In any event, the purpose of contentSchema is not specifically to describe JWTs that just happens to be a particularly useful application of it.

Any time someone stuffs JSON in a string, such as submitting JSON as the value in an application/x-www-form-urlencoded payload, contentSchema can be used to describe it:

{
    "$schema": "https://json-schema.org/draft-08/hyper-schema#",
    "links": [
        {   
            "rel": "self",
            "href": "whatever",
            "submissionMediaType": "application/x-www-form-urlencoded",
            "submissionSchema": {
                "type": "object",
                "properties": {
                    "someFieldUsingJson": {
                        "type": "string",
                        "contentMediaType": "application/json",
                        "contentSchema": {...}
                    }
                }
            }   
        }
    ]
}

So finding alternative ways to express JWTs is not really part of the goal here.

awwright · 2018-11-10T02:02:05Z

Yeah I was hoping to shed some light on the problem here.

I think one of the issues it illustrates, upon further reflection, is it limits authors to one mapping of a media type to a JSON document (one per media type). Either we can force authors to work in this for simplicity, or we can allow some sort of keyword that defines which mapping from an arbitrary media type to a JSON document to use.

handrews · 2018-11-10T02:10:50Z

@gcallaghan

After discussing this with KayEss on the JSON Schema slack (I don't know their github username), I think that the array approach is the way to go. They pointed out some potential confusion with "payload" vs "claims", which I can dig out of the slack history if you'd like. It had to do with which term applied to the encoded vs decoded JSON, I think.

Anyway, here is the array version (note that I had to re-do the object example a bit because I left out the "properties" keywords in several places and just dumped property names in at the schema keyword level. Probably the single most common user error with JSON Schema, sadly. Anyway, I updated the object version above, here is the array version):

{
    "$schema": "https://json-schema.org/draft-08/hyper-schema#",
    "links": [
        {   
            "rel": "self",
            "href": "whatever",
            "headerSchema": {
                "type": "object",
                "required": ["authorization"],
                "properties": {
                    "authorization": {
                        "type": "object",
                        "required": ["Bearer"],
                        "maxProperties": 1,
                        "properties": {
                            "Bearer": {
                                "type": "string",
                                "contentMediaType": "application/jwt",
                                "contentSchema": {
                                    "type": "array",
                                    "minItems": 2,
                                    "items": [
                                        {
                                            "const": {
                                                "typ": "JWT",
                                                "alg": "HS256"
                                            }   
                                        },
                                        {
                                            "$ref": "http://example.com/schemas/rfc7519/registered-claims",
                                            "required": [
                                                "iss",
                                                "exp",
                                                "http://example.com/is_root"
                                            ],  
                                            "properties": {
                                                "iss": {"type": "string"},
                                                "exp": {"type": "integer"},
                                                "http://example.com/is_root": {"type": "boolean"}
                                            }
                                        }
                                    ]
                                }   
                            }       
                        }       
                    }       
                }
            }   
        }
    ]
}

The change is minimal. required becomes minItems, properties becomes the tuple form of items which takes an array of schemas, and the property names are removed. Since the schemas happened to be in the correct order in the object example, I didn't have to touch them at all.

handrews · 2018-11-10T02:19:38Z

@awwright we already basically cover this in JSON Hyper-Schema where submissionMediaType and targetMediaType can be non-JSON media types that we map into the data model. We have an example of multipart/alternative in the current Hyper-Schema spec.

contentMediaType + contentSchema works exactly like targetMedaiType + targetSchema and submissionMediaType + submissionSchema. These keyword pairs all describe something that is generally taken to be a string (a JSON string for content*, and request or response payloads for the other two, and apply the schema to the contents of the string.

If you would like to debate different ways to map media types into the JSON data model, please open a separate issue for that. I do not think it is necessary, as we have not had any negative feedback on this topic with Hyper-Schema despite it getting a good bit of interest, a nearly-complete implementation, and several blog posts. In any event, it is orthogonal to contentSchema which is really not doing anything novel here, it's just filling in a gap analogous to existing keywords in Hyper-Schema.

handrews · 2018-11-10T02:23:18Z

@awwright admittedly it may become necessary as more people work with it, which is why I think having a separate issue would be good.

awwright · 2018-11-10T02:23:51Z

I'll take a closer look at the issues you reference.

gcallaghan · 2018-11-12T15:35:45Z

@handrews The array implementation makes sense to me. I understand the naming issue, where the encoding can change how the vocabulary is understood. Basically, encoded, the middle part is known as the payload, and the payload contains claims. So when the payload is decoded, it is a JSON object with the claims inside. The array method is more structurally accurate.

handrews · 2018-11-12T15:44:04Z

@gcallaghan looks like we have a viable example, then!

I will probably proceed with a PR, using the same vague language around "mapping the media type into the JSON data model" (the data model being defined in Core) that we use in Hyper-Schema. If/when @awwright comes up with a more clear set of instructions or conventions for performing mappings, we will update all of the specs then. But I think there is no need to block contentMediaType/contentSchema on this when the other *MediaType/*Schema pairs are written this way already.

handrews · 2018-11-27T20:40:07Z

Merged #673

handrews added Type: Enhancement Priority: Medium hypermedia validation labels Nov 7, 2018

handrews added this to the draft-08 milestone Nov 7, 2018

handrews added the Status: In Progress label Nov 12, 2018

handrews mentioned this issue Nov 12, 2018

Add "contentSchema" #673

Merged

handrews closed this as completed Nov 27, 2018

ghost removed the Status: In Progress label Nov 27, 2018

gregsdennis added this to Hypermedia Jul 17, 2024

gregsdennis moved this to Closed in Hypermedia Jul 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"contentSchema" for embedded string data #669

"contentSchema" for embedded string data #669

handrews commented Nov 7, 2018

gcallaghan commented Nov 8, 2018

handrews commented Nov 9, 2018 •

edited

Loading

gcallaghan commented Nov 9, 2018

awwright commented Nov 9, 2018

handrews commented Nov 10, 2018 •

edited

Loading

awwright commented Nov 10, 2018 •

edited

Loading

handrews commented Nov 10, 2018

handrews commented Nov 10, 2018

handrews commented Nov 10, 2018

awwright commented Nov 10, 2018

gcallaghan commented Nov 12, 2018

handrews commented Nov 12, 2018

handrews commented Nov 27, 2018

"contentSchema" for embedded string data #669

"contentSchema" for embedded string data #669

Comments

handrews commented Nov 7, 2018

gcallaghan commented Nov 8, 2018

handrews commented Nov 9, 2018 • edited Loading

gcallaghan commented Nov 9, 2018

awwright commented Nov 9, 2018

handrews commented Nov 10, 2018 • edited Loading

awwright commented Nov 10, 2018 • edited Loading

handrews commented Nov 10, 2018

handrews commented Nov 10, 2018

handrews commented Nov 10, 2018

awwright commented Nov 10, 2018

gcallaghan commented Nov 12, 2018

handrews commented Nov 12, 2018

handrews commented Nov 27, 2018

handrews commented Nov 9, 2018 •

edited

Loading

handrews commented Nov 10, 2018 •

edited

Loading

awwright commented Nov 10, 2018 •

edited

Loading