Skip to content

Is jsonschema.validate slow? #277

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MichaelCurrie opened this issue Mar 9, 2016 · 6 comments
Closed

Is jsonschema.validate slow? #277

MichaelCurrie opened this issue Mar 9, 2016 · 6 comments

Comments

@MichaelCurrie
Copy link

I come to you from the tracker-commons open-source project.

Using the following 57 MB JSON file and 7 KB schema file, the following code takes upwards of 70 seconds to run.

import json, jsonschema

with open("wcon_schema.json", "r") as wcon_schema_file:
    schema = json.loads(wcon_schema_file.read())

with open("testfile_new.wcon", 'r') as infile:
    serialized_data = infile.read()

w = json.loads(serialized_data)

jsonschema.validate(w, schema)

This dwarfs all the other processing steps I'm performing on it by a factor of 100. Is there something I'm missing? Is it really supposed to take this long to validate?

Thanks.

@MichaelCurrie MichaelCurrie changed the title jsonschema.validate slow? jsonschema.validate slow? Mar 9, 2016
@MichaelCurrie MichaelCurrie changed the title jsonschema.validate slow? Is jsonschema.validate slow? Mar 9, 2016
@Julian
Copy link
Member

Julian commented Mar 21, 2016

Hi! Thanks for the report. Haven't gotten a chance to dig in yet, but as a general principle, are you using CPython? If you are, the answer is probably "yes very likely" :). But I'll have a look.

@ccoffrin
Copy link

ccoffrin commented Jun 14, 2016

I also noticed that validation of large files can be time consuming. I am seeing 10-40 seconds (depending on the complexity of the schema) required to validate a json file with 425,384 entries. Using version 2.5.1 distributed on pypi.

@MichaelCurrie
Copy link
Author

MichaelCurrie commented Jun 17, 2017

For comparison, the JavaScript library ajv validates the same file in under 3 seconds.

https://github.com/epoberezkin/ajv

Perhaps someone knows of a Python library that wraps ajv?

@Julian
Copy link
Member

Julian commented Jun 17, 2017

@MichaelCurrie what was the answer to what implementation of Python you are using?

Have you tried profiling validation and seeing what's taking so long? I'd definitely accept performance patches that preserve backwards compatibility (and do not make things slower on PyPy).

Also see #232 -- I'd love to have actual benchmarks added, it's the only real way to ensure that performance regressions don't happen, at least for the benchmarks in the benchmark suite.

@MichaelCurrie
Copy link
Author

MichaelCurrie commented Jun 17, 2017

I am using the CPython implementation of Python:

Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform
>>> platform.python_implementation()
'CPython'

@Julian do you have an example of how to profile the code in a way that would provide you with useful information? Alternatively, I have already provided my example files and code, so you could also do this profiling if you are interested. I suspect any performance improvements gleaned would have general applicability.

Thanks for your help.

@ankostis
Copy link
Contributor

FYI, #158 was an earlier attempt to improve speed.

Julian added a commit that referenced this issue Sep 22, 2019
2903943b Merge pull request #277 from json-schema-org/non-string-formats
95425df4 Move tests for builtin formats ignoring non-strings upstream.
aa71850e Merge pull request #278 from gregsdennis/master
4b638034 updated 2019-09 references to correct schema uri
8656a718 moved 2019-08 tests to 2019-09; fixed a few 2019-06 $refs
0f888a8f Make ajv's tests not fail the build.
3922a3c2 Run the suite sanity checks on Py3.
9eda690f Minor style.

git-subtree-dir: json
git-subtree-split: 2903943b4c31a33a9dc8b017174deefdf46f5213
@Julian Julian closed this as completed Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants