Skip to content

Reference to local files (with relative/no path) doesn't work #98

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sblask opened this issue May 6, 2013 · 22 comments
Closed

Reference to local files (with relative/no path) doesn't work #98

sblask opened this issue May 6, 2013 · 22 comments

Comments

@sblask
Copy link

sblask commented May 6, 2013

I am trying to create a json schema (draft 4) which uses definitions from another file and includes them using $ref. I don't have any URL set up for the definitions file and I wasn't planning on doing that. I thought "$ref": "definitions.json#a_definition" (definitions.json lying in the same directory) would be enough, but I get a "No such file or directory" error.

I am not sure this should work, but this: http://code.google.com/p/jsonschema2pojo/wiki/Reference#$ref suggests it should ("$ref" : "user.json"). Googling/reading the draft didn't get me anywhere.

Using file://absolute/path/to/json/file does work, but that would be super ugly. Is there a better way to do what I want to do? Or is this something that needs fixing?

@Julian
Copy link
Member

Julian commented May 6, 2013

Hey!

The ref resolution code is pretty simple. Mostly it just delegates.

I haven't looked at it in a bit, but what I'd assume is going on is that relative refs will be resolved relative to the working directory, not relative to the schema file they came from.

Is that the behavior you're seeing? If not, I'll give this a try later when I get to a computer.

This at least should be documented, but I have to think about which is correct.

Oh and if you want you can use the store argument to just specify which schema you mean with your ref. That's probably easiest.

@gazpachoking
Copy link
Contributor

I suspect this would probably work if the base uri was set to file:///wherever/yourschema.json, without this information, the ref resolver doesn't know what your relative refs are relative to. Not sure right now the best way to specify the base uri, I'll have to take a closer look later.

@sblask
Copy link
Author

sblask commented May 6, 2013

I figured I could write my own resolver if I really wanted too or add a handler, but as I said, I am not sure whether it should work out of the box or not (i.e. does the spec allow this or not). The store option seems a little hacky as I won't ever have actual URIs, but that could work and would be fairly easy indeed.

@Julian the code uses urllib2.urlopen which doesn't support relative paths at all, no matter what the current working directory is and if I omit the file:// it doesn't want to do anything. urllib.urlopen would work though, at least if the right working directory is set

@gazpachoking "wherever" would have to be absolute as well then? :-/ Obviously the references should be relative to the current file, but as the code uses a json object directly and doesn't read the file itself, we don't know where the current file is. Without changing that it would be fairly complicated to change that I guess. And changing that would break backwards compatibility...

@gazpachoking
Copy link
Contributor

Yeah, you are going to have to specify the absolute path that relative paths should be judged on no matter what, as jsonschema is not the one loading from the disk, so it doesn't know what the location of the schema is. I think something of this nature will work:

schema = json.load(open("/path/to/schema"))
resolver = RefResolver.from_schema(schema, base_uri="file:///path/to/schema")
validator = Draft4validator(schema, resolver=resolver)

@gazpachoking
Copy link
Contributor

Whether there could/should be a better way, not sure. Maybe @Julian has some input on that.

@gazpachoking
Copy link
Contributor

I'm also making another tool right now to make dealing with json references easier (mostly outside of json schema, but it could be used with it as well) https://github.com/gazpachoking/jsonref

It isn't quite ready for use yet, but it will have the advantage of also handling refs that are not subschemas.

schema = jsonref.load(open("/wherever/schema"), base_uri="file:///wherever/schema", jsonschema=True)
# Then it can be used anywhere without worrying about refs e.g.
jsonschema.validate(someinstance, schema)

@Julian
Copy link
Member

Julian commented May 6, 2013

If urllib does not resolve things without schemes, that would seem to
indicate that those are not valid URIs, which I guess I kind of was
assuming.

JSON Schema doesn't define any special semantics for "$ref" other than
"it's a URI under the relevant spec", so if that's the case, I revert to my
previous suggestion and would think the current behavior is likely to be
correct (and that the schema you're copying from is wrong or outdated,
which is quite common for schemas with refs I've seen in the wild).

I think you'd need to have some other canonical name for the resource
you're locating.

@gazpachoking
Copy link
Contributor

@Julian Would it be good for IValidator to take a base_uri keyword argument (that would be passed to the RefResolver)? It seems like it might be a common enough need to allow setting it without instantiating a RefResolver yourself.

@Julian
Copy link
Member

Julian commented May 6, 2013

Well it's not like it requires much effort to instantiate a resolver :p...

(Keeping stuff separate is good if we need/want to change the API in some
way later, it requires fewer things to change).
On May 6, 2013 5:20 PM, "Chase Sterling" [email protected] wrote:

@Julian https://github.com/Julian Would it be good for IValidator to
take a base_uri keyword argument (that would be passed to the RefResolver)?
It seems like it might be a common enough need to allow setting it without
instantiating a RefResolver yourself.


Reply to this email directly or view it on GitHubhttps://github.com//issues/98#issuecomment-17509002
.

@gazpachoking
Copy link
Contributor

Yep, sounds fine.

@sblask
Copy link
Author

sblask commented May 7, 2013

Thank you for your help! I got it to work now:

with open(os.path.join(absolute_path_to_base_directory, base_filename)) as file_object:
    schema = json.load(file_object)
resolver = jsonschema.RefResolver('file://' + absolute_path_to_base_directory + '/', schema)
jsonschema.Draft4Validator(schema, resolver=resolver).validate(data)

RefResolver.from_schema did not work , because the base_uri is taken from the schema's id - which I don't define. My schema looks something like this now:

{
    "$schema": "http://json-schema.org/draft-04/schema",
    "type": "object",
    "properties": {
        "something_referencing_a_local_file": {"$ref": "another_file.json#a_definition"}
    }
}

Not sure what people who do define id have to do to test locally, using the store option might be the best.

@sblask sblask closed this as completed May 7, 2013
@Julian
Copy link
Member

Julian commented May 7, 2013

Glad you got it working :). That looks about right to me.

@jiaj0000
Copy link

jiaj0000 commented Aug 6, 2014

Hi,
I am sorry for asking this issue again after one year. I have the same question regarding validating $ref with relative path.

http://stackoverflow.com/questions/25145160/json-schema-ref-does-not-work-for-relative-path/25147381?noredirect=1#comment39166721_25147381

I really hope you can help me with it.

After reading this post, I am assuming that I have to have absolute path somewhere(like in the resolver..) if I wanna use relative path for $ref?

@Julian

Thank you in advance!

@quantumdoug
Copy link

Thanks to sblask for his previous RefResolver approach. Here is my file based scope relative example. This code runs if the schemas are placed in the correct schemas subfolders. The key is using bootstrap schema to start the uri_root relative schema path and all other $ref paths are scope relative so require "../folder/schema.json" format.

**************************************************

import jsonschema
from uuid import uuid4 as UUID

schema_root = "/myapp/schemas"
uri_root = 'file://' + schema_root + '/'

def json_schema_pathname(ctype, cname):
    "assumes two-level folder structure of schemas/classtype/classname.json"
    return ctype + "/" + cname + ".json"

bootstrap schema is used so that all "$ref"s in the schema files are scope relative (so uses "../folder/schema.json" access)

 def top_validate(jsondict, schema_path):
    "use schema_path is relative to uri_root to bootstrap $ref relative scope pathing everywhere else"
    bootstrap_schema = {u'$schema': u'http://json-schema.org/draft-04/schema',
                        u'$ref': schema_path  # acts as top schema so bootstraps all other scope relative $ref 
                        }
    resolver = jsonschema.RefResolver(uri_root, bootstrap_schema)
    return jsonschema.Draft4Validator(bootstrap_schema, resolver=resolver).validate(jsondict)

"""

from file schemas/types/int.json

{   "$schema": "http://json-schema.org/draft-04/schema",
    "type":"number"
    }

from file schemas/types/unicode.json

{   "$schema": "http://json-schema.org/draft-04/schema",
    "type":"string"
    }

from file schemas/types/YorN.json

{   "$schema": "http://json-schema.org/draft-04/schema",
    "type":"string",
    "enum": ["Y", "N"]
    }

from file schemas/types/optYorN.json (contains scope relative paths using ../)

{   "$schema": "http://json-schema.org/draft-04/schema",
    "oneOf": [{"type": ["null"] },
              {"$ref": "../types/YorN.json"}
              ]
    }

from file schemas/builtins/UUID.json

{   "type":"object",
    "$schema": "http://json-schema.org/draft-04/schema",
    "required": ["_classname", "_classtype", "hex"],
    "properties":{"_classname": {"enum": ["UUID"] },
                  "_classtype": {"enum": ["builtin"] },
                 "hex": {"type":"string",
                          "pattern": "^[0-9a-f]{32}$" }
                  },
    "additionalProperties": false
    }

"""

top_validate(3, json_schema_pathname('types','int'))
top_validate("3", json_schema_pathname('types','unicode'))
top_validate("Y", json_schema_pathname('types','YorN'))
top_validate(None, json_schema_pathname('types','optYorN'))
top_validate("Y", json_schema_pathname('types','optYorN')) # follows the scope relative $ref

uuid = {'hex': 'f9cc22e0df7b4857ac346e9bf9df3da2', '_classname': 'UUID', '_classtype': 'builtin'}
bad = {'hex': 'F9CC22E0DF7B4857AC346E9BF9DF3DA2', '_classname': 'UUID', '_classtype': 'builtin'}

top_validate(uuid, json_schema_pathname('builtins','UUID'))

all others pass validation except these two purposeful errors.

try:
    top_validate(3, json_schema_pathname('types','unicode'))
except Exception as e:
    print e

try:
    top_validate(bad, json_schema_pathname('builtins','UUID'))
except Exception as e:
    print e

"""
3 is not of type u'string'

Failed validating u'type' in schema:
{u'$schema': u'http://json-schema.org/draft-04/schema',
u'type': u'string'}

On instance:
3
'F9CC22E0DF7B4857AC346E9BF9DF3DA2' does not match u'^[0-9a-f]{32}$'

Failed validating u'pattern' in schema[u'properties'][u'hex']:
{u'pattern': u'^[0-9a-f]{32}$', u'type': u'string'}

On instance[u'hex']:
'F9CC22E0DF7B4857AC346E9BF9DF3DA2'
"""

@iandanforth
Copy link

@sblask Has the best answer so far. Here's a slight modification that more closely conforms to the intent of the source.

import os
import json
import jsonschema

schema_path = os.path.join(absolute_path_to_base_directory, base_filename)
with open(schema_path) as file_object:
    schema = json.load(file_object)

# Your data
data = {"sample": "woo!"}

# Note that the second parameter does nothing.
resolver = jsonschema.RefResolver('file://' + absolute_path_to_base_directory + '/', None)

# This will find the correct validator and instantiate it using the resolver.
# Requires that your schema a line like this: "$schema": "http://json-schema.org/draft-04/schema#"
jsonschema.validate(data, schema, resolver=resolver)

@mottosso
Copy link

I spent the better part of a morning struggling with this on Windows, so thought I'd share in case anyone else runs into the same issue.

@sblask solution works well, with the one caveat that the absolute_path.. needs forward-slashes, as opposed to default back-slashes, and file:// needing an extra /.

I got it working with this.

with open(os.path.join(absolute_path_to_base_directory, base_filename)) as file_object:
    schema = json.load(file_object)
resolver = jsonschema.RefResolver('file:///' + absolute_path_to_base_directory.replace("\\", "/") + '/', schema)
jsonschema.Draft4Validator(schema, resolver=resolver).validate(data)

The extra forward-slash still doesn't make much sense, without it, I'm getting this.

jsonschema.exceptions.RefResolutionError: <urlopen error file not on local host>

And I haven't yet tested it on other platforms.

@kerlyn-bsd3
Copy link

The extra forward-slash still doesn't make much sense, without it, I'm getting this.

What is the value of your 'absolute_path_to_base_directory'? Is it missing an initial
forward-slash to designate the root of the file system?

geokala pushed a commit to geokala/romaine that referenced this issue Feb 2, 2016
geokala pushed a commit to geokala/romaine that referenced this issue Mar 1, 2016
@clenk
Copy link

clenk commented Aug 12, 2016

I recently encountered this issue but what solved it for me was giving RefResolver the whole filename including the extension, not just the directory.

resolver = jsonschema.RefResolver('file://' + absolute_path_to_base_directory + '/' + base_filename, None)

leohemsted pushed a commit to alphagov/notifications-api that referenced this issue Aug 30, 2016
see python-jsonschema/jsonschema#98

took boring generic things (eg uuid) into definitions.json, and also
separated email and sms notification objects into respective files
blakesweeney added a commit to RNAcentral/rnacentral-data-schema that referenced this issue Dec 19, 2017
@topher515
Copy link

I had a hard time figuring this out myself, and I kept googling into this github issue. (The solution is pretty much what @clenk says above.)

Here's a SO Q&A which covers it: https://stackoverflow.com/questions/53968770/how-to-set-up-local-file-references-in-python-jsonschema-document

Hopefully this helps someone in the future.

@koteez
Copy link

koteez commented Mar 20, 2020

Hi, I have the same issue but I was not planning to write my own Python script and more in favor of using the command line and having a parameter. Is there in Draft7 (I am based on that one) a way to override the base_uri ?

@Julian
Copy link
Member

Julian commented Mar 20, 2020 via email

marcofavorito added a commit to fetchai/agents-aea that referenced this issue May 1, 2020
bradbishop added a commit to bradbishop/entity-manager that referenced this issue May 11, 2020
Add a script to validate configurations against the schema.  Use
https://github.com/Julian/jsonschema to perform the json validation.

The script is intended to be run from a continuous integration
environment or by configuration/schema developers.  A key
assumption/feature of the script is that its users will always prefer to
resolve relative references to the local filesystem.  As such, the
script computes a base URI that points to the filesystem and instructs
the validator to use that in place of whatever base_uri it derives from
the global $id attribute.  For additional reading see:

https://json-schema.org/understanding-json-schema/structuring.html#the-id-property
python-jsonschema/jsonschema#98

Without any options the script assumes it is being run from an
entity-manager source distribution and attempts to find the schema and
configuration files relative to the location of the script.

Alternatively, the script can validate arbitrary json files against
arbitrary schema:

  ./validate-configs.py -s foo.schema.json -c test1.json -c test2.json

By default the validation stops as soon as a configuration does not
validate.  Use -k to override this behavior and validate as many
configurations as possible.

Provide an option to instruct the script to ignore a list of
configurations that are expected to fail validation to be used in
continuous integration setups - similar in concept to xfail mechanisms
provided by most build systems with unit test support.

Change-Id: I7d67a54993a6d5e00daf552d9d350c80411b997b
Signed-off-by: Brad Bishop <[email protected]>
bradbishop added a commit to openbmc/entity-manager that referenced this issue Jun 29, 2020
Add a script to validate configurations against the schema.  Use
https://github.com/Julian/jsonschema to perform the json validation.

The script is intended to be run from a continuous integration
environment or by configuration/schema developers.  A key
assumption/feature of the script is that its users will always prefer to
resolve relative references to the local filesystem.  As such, the
script computes a base URI that points to the filesystem and instructs
the validator to use that in place of whatever base_uri it derives from
the global $id attribute.  For additional reading see:

https://json-schema.org/understanding-json-schema/structuring.html#the-id-property
python-jsonschema/jsonschema#98

Without any options the script assumes it is being run from an
entity-manager source distribution and attempts to find the schema and
configuration files relative to the location of the script.

Alternatively, the script can validate arbitrary json files against
arbitrary schema:

  ./validate-configs.py -s foo.schema.json -c test1.json -c test2.json

By default the validation stops as soon as a configuration does not
validate.  Use -k to override this behavior and validate as many
configurations as possible.

Provide an option to instruct the script to ignore a list of
configurations that are expected to fail validation to be used in
continuous integration setups - similar in concept to xfail mechanisms
provided by most build systems with unit test support.

Change-Id: I7d67a54993a6d5e00daf552d9d350c80411b997b
Signed-off-by: Brad Bishop <[email protected]>
@rwberendsen
Copy link

Hi, is it also possible to load all schema files in jsonschema, independently of their location in the directory structure on disk, and independently of whether they are actually served on network addressable locations over https? Then have jsonschema resolve $refs just by resolving against $id?

See https://json-schema.org/understanding-json-schema/structuring.html :

"
Note

Even though schemas are identified by URIs, those identifiers are not necessarily network-addressable. They are just identifiers. Generally, implementations don’t make HTTP requests (https://) or read from the file system (file://) to fetch schemas. Instead, they provide a way to load schemas into an internal schema database. When a schema is referenced by it’s URI identifier, the schema is retrieved from the internal schema database.
"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests