Skip to content

Add a helper for extending a validator to reject unknown formats #252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jacobsvante opened this issue Oct 22, 2015 · 22 comments
Closed

Add a helper for extending a validator to reject unknown formats #252

jacobsvante opened this issue Oct 22, 2015 · 22 comments
Labels
Enhancement Some new desired functionality

Comments

@jacobsvante
Copy link

It would be nice if a strict mode could be turned on for FormatChecker. By strict I mean that jsonschema will raise an error if it encounters a format that it doesn't recognize.

@Julian
Copy link
Member

Julian commented Oct 25, 2015

Hey! Thanks for the report.

I'm generally not in favor of strict modes (and more generally of boolean flags). Specifically here, how will someone know to use it unless they've already guessed that their problem is a missing format checker?

Also, this can already be done -- use jsonschema.validators.extend, and add a oneOf / enum to the metaschema for format. I can elaborate if you'd like, shouldn't be more than ~3 lines of code.

@jacobsvante
Copy link
Author

Hi Julian! Thanks for your input.

Not sure I understand your reasoning, but it's early morning here so that might be why 😉

The reason I opened this issue is because I have a REST API where users push in data, according to a specific JSON schema. I want them to get a 400 Bad Request if I encounter data that I will not be able to parse, like a date not formatted according to ISO 8601.

If this can already be done I would love to see how. Thanks man!

@Julian
Copy link
Member

Julian commented Oct 27, 2015

Your users are sending you schemas to use or instances to validate? It sounds like you're saying instances?

@jacobsvante
Copy link
Author

Yup, they are sending instances for me to validate.

@Julian
Copy link
Member

Julian commented Oct 27, 2015

Hm, OK, then I'm confused :), how does that relate to this ticket?

@jacobsvante
Copy link
Author

That's because I was confused... Ignore my previous messages.

What I'm looking for as I stated originally is a way to become aware of formats that will not be validated. As a developer it's hard to notice if jsonschema just ignores the format. Example:

In [1]: import jsonschema
In [2]: print(jsonschema.validate("xy", {"type": "string", "format" : "time-hour"}, format_checker=jsonschema.FormatChecker()))
None

How would I catch this? As an author of the schemas I want to be aware of which format checks I need to create.

@Julian
Copy link
Member

Julian commented Oct 27, 2015

I see, OK, so now possibly my first comment might make sense:

The reason I don't generally think strict modes are useful is (among other things), as a developer, how would you ever know that you need strict mode unless you already knew that your instance contained formats that didn't exist? It sounds like for you that you just want to do this once possibly? Or once when you get a new schema, to see if there's any formats that someone put in there that you need to implement?

You can extend a metaschema to define that formats other than the known ones are invalid. E.g.:

>>> Validator = jsonschema.validators.create(
...     meta_schema={
...         "allOf" : [
...             jsonschema.validators.Draft4Validator.META_SCHEMA, {
...                 "properties" : {"format" : {"enum" : jsonschema.FormatChecker.checkers.keys()}}}
...             ],
...         },
...    validators=jsonschema.validators.Draft4Validator.VALIDATORS,
... )
>>> Validator.check_schema({"format" : "date"})
>>> Validator.check_schema({"format" : "datasdfe"})
Traceback (most recent call last):
  File "<bpython-input-17>", line 2, in <module>
    Validator.check_schema({"format" : "datasdfe"})
  File "/usr/local/lib/python2.7/site-packages/jsonschema/validators.py", line 76, in check_schema
    raise SchemaError.create_from(error)
SchemaError: 'datasdfe' is not one of ['regex', 'time', 'hostname', 'uri', 'host-name', 'ipv4', 'ip-address', 'ipv6', 'date', 'email']

Failed validating 'enum' in schema['allOf'][1]['properties']['format']:
    {'enum': ['regex',
              'time',
              'hostname',
              'uri',
              'host-name',
              'ipv4',
              'ip-address',
              'ipv6',
              'date',
              'email']}

On instance['format']:
    'datasdfe'

Might be useful to have that as a utility you can compose onto a validator, but needs a bit of cleaning up to handle the general case, you'd have to do it a bit different to use the format_checker that's provided at validation-time, which might not necessarily be the same.

@Julian Julian changed the title Change behaviour of FormatChecker Add a helper for extending a validator to reject unknown formats Jan 21, 2017
@martingkelly
Copy link

+1 for having some way of handling this, though I have no opinion about the best way to plumb it through. I was getting silent failures with no indication that anything was wrong. This can easily happen due to a typo, for example. Since schemas are used to certify that data is valid, I think it's problematic to pass through an unknown format and say "everything is OK".

I personally would favor making "failure on unknown format" the default, but I'm guessing that would break existing use cases. If we can't make it default, I think having a way to turn on a "strict mode" would be very useful.

@martingkelly
Copy link

(a more official way that is)

@handrews
Copy link

I personally would favor making "failure on unknown format" the default, but I'm guessing that would break existing use cases.

Not to mention completely violate the specification.

The simplest way to handle this during development is to use a locally meta-schema that puts an enum on the format values.

@Julian
Copy link
Member

Julian commented Jun 12, 2018

Since schemas are used to certify that data is valid, I think it's problematic to pass through an unknown format and say "everything is OK".

I say this quite a bit, but schemas are not a replacement for unit tests :) -- so "silent" here is "silent" in the same sense as if you wrote buggy code and didn't notice until your test failed.

It won't be the default, for the reason @handrews mentioned, this library is an implementation of the specification, but I'm still happy to accept a helper function for adding an enum restriction to a metaschema. Though increasingly I'm thinking there'll shortly be a jsonschema-ext package to contain stuff like that.

@martingkelly
Copy link

I personally would favor making "failure on unknown format" the default, but I'm guessing that would break existing use cases.
 Not to mention completely violate the specification.

Yes, the spec does say an implementation should (though not shall) succeed if it sees an unknown format (http://json-schema.org/latest/json-schema-validation.html#rfc.section.7.2). However, it says nothing about providing an option to behave differently. Regardless, I think specs should come from use cases, as their point is to standardize systems for solving problems. I think my use case here is legitimate, although I can also understand those who want to silently pass through an unknown format. My point is that providing the option to do both is useful. If the spec forbids making my way the default, no problem, but having a way to enforce the constraint is important. I would love to be able to set a property like format-required or similar.

@handrews, if I use a meta-schema, can I also validate against the main jsonschema spec? AFAIK, you can't validate against multiple meta-schemas. That said, uses a meta-schema doesn't seem like a very "obvious" / convenient way of handling it, so it's not something most people would think of. Having an easier way would be helpful.

I say this quite a bit, but schemas are not a replacement for unit tests :) -- so "silent" here is "silent" in the same sense as if you wrote buggy code and didn't notice until your test failed.

Sure, schemas are not a replacement for unit tests, just as writing good, defensive code is not a replacement for unit tests. But, schemas are used for enforcing constraints, and in this case, I have a constraint to enforce, so ideally there should be a way to express that in the schema language.

It won't be the default, for the reason @handrews mentioned, this library is an implementation of the specification, but I'm still happy to accept a helper function for adding an enum restriction to a metaschema. Though increasingly I'm thinking there'll shortly be a jsonschema-ext package to contain stuff like that.

What method do you propose? If you sketch out the gist of the design, I'd be happy to code it up if I have the time.

@robherring
Copy link
Contributor

@martingkelly Adding this line to your meta-schema will include the base meta-schema:

allOf: [ { $ref: "http://json-schema.org/draft-06/schema#" } ]

@martingkelly
Copy link

Ah, I hadn't thought of using allOf, good suggestion, thanks.

@martingkelly
Copy link

That said, I'm not sure how to enforce the format enum recursively for all levels of the hierarchy. Any ideas? Here's what I have now, which is passing no matter what I put in the enum. Interestingly, it is even passing if I change "#/defs/format-enum" to "#defs/BOGUS".

$schema:
  allOf:
    - $ref: http://json-schema.org/draft-06/schema#
    - $ref: "#/defs/format-enum"

defs:
  format-enum: 
    description: Allow only the formats listed here
    type: object
    properties:
      format:
        type: string
        enum:
          - a
          - b
          - c

This is expressed in YAML, but it parses the same.

@handrews
Copy link

@martingkelly

However, it says nothing about providing an option to behave differently.

Option, sure. Default, absolutely not.

if I use a meta-schema, can I also validate against the main jsonschema spec? AFAIK, you can't validate against multiple meta-schemas. That said, uses a meta-schema doesn't seem like a very "obvious" / convenient way of handling it, so it's not something most people would think of. Having an easier way would be helpful.

The primary focus of draft-08 is improving extensibility. Look at issues tagged with "vocabulary" in the spec repo. It's a huge topic and I've been working on it for months. Still lots to work out.

That said, I'm not sure how to enforce the format enum recursively for all levels of the hierarchy.

Because it's a gigantic pain right now. Look at how the hyper-schema meta-schema currently handles it (it has to re-define a lot of things). Search the spec repo issues for $recurse to see how we're trying to fix this. It's challenging, although I think we're getting close to finding something that will work for draft-08.

Though increasingly I'm thinking there'll shortly be a jsonschema-ext package to contain stuff like that.

The next draft should define how extensibility works. While the exact programmatic interface for extensions will, of course, be language- and implementation-specific, the overall structure and capabilities will be defined. Again, I'm being vague because this topic is enormous and an active area of work right now.

@handrews
Copy link

That said, I'm not sure how to enforce the format enum recursively for all levels of the hierarchy.

The spec even has a note on how hard it is. I'm so looking forward to removing that note and replacing it with an actual solution in draft-08.

@martingkelly
Copy link

OK, I'm glad to hear this is being worked on. Can you expand on exactly what was done to make the hyper-schema work? I tried to add properties: $ref: # and similar but did not get it to work.

@handrews
Copy link

Can you expand on exactly what was done to make the hyper-schema work? I tried to add properties: $ref: # and similar but did not get it to work.

Copy the hyper-schema meta-schema. Change $id. Remove links and base, and add your specialized format.

Regardless, I think specs should come from use cases, as their point is to standardize systems for solving problems.

As someone who has been maintaining a spec for a couple of years, I can tell you that pretty much everyone who has ever had their request declined uses this argument to mean "my use case is the use case you should solve", regardless of its impact on everyone else, or whether it fits the overall philosophy of the system.

@martingkelly
Copy link

Copy the hyper-schema meta-schema. Change $id. Remove links and base, and add your specialized format.

Is there a way to do this without having to copy the hyper-schema? I was hoping to use allOf as suggested to avoid copying it (and having to redo the work when I update to the next spec revision).

As someone who has been maintaining a spec for a couple of years, I can tell you that pretty much everyone who has ever had their request declined uses this argument to mean "my use case is the use case you should solve", regardless of its impact on everyone else, or whether it fits the overall philosophy of the system.

Of course you can't and shouldn't try to please everyone. But, to the best of my knowledge, there is no similar spec to json schema, except if I were to use XML (which I don't want to do). So if I say "OK, I guess I need to find a spec that supports my use case", I'm left with inventing my own thing and duplicating the 90% of stuff that I do want from json schema. I would say that's a lose-lose for everyone.

There's always a balance between trying to please everyone and being too inflexible, and it's always a judgment call.

@handrews
Copy link

Is there a way to do this without having to copy the hyper-schema? I was hoping to use allOf as suggested to avoid copying it (and having to redo the work when I update to the next spec revision).

That is the whole point of the warning in the spec that I linked to, and the motivation for $recurse as I noted earlier. It's currently awful. As in, completely the opposite of maintainable. If you're thinking "it can't be this bad, I must be missing something"... nope, it's that bad. Which is why I've been spending half a year trying to come up with something better :-)

As you can see from the fact that $recurse already made it to PR once but failed because I missed a huge problem, it's not trivial to fix. And that's not even getting into formally declaring your keyword vocabulary (including known vs unknown values for value-extensible fields such as format) instead of just validating conformance at the meta-schema level.

There's always a balance between trying to please everyone and being too inflexible, and it's always a judgment call.

I'm reasonably confident that the current balance on media types (JSON Schema is defined over a data model rather than text, but otherwise does not require or constrain how implementations handle different media types) is in a good place. These things never get universal agreement, but there is a good level of flexibility in the current approach.

@Julian Julian added the Enhancement Some new desired functionality label Apr 23, 2019
@Julian
Copy link
Member

Julian commented Aug 18, 2021

I'm going to close this since I believe as of draft2019+ (which should come in the next release) that this is possible to do purely within JSON Schema (the spec) using dynamicRef. Someone needs presumably to author said schema, but yeah I think it's doable now.

If that's not the case, happy to revisit (and if someone does write said schema, feel free to share it).

@Julian Julian closed this as completed Aug 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Some new desired functionality
Projects
None yet
Development

No branches or pull requests

5 participants