-
-
Notifications
You must be signed in to change notification settings - Fork 592
Is jsonschema.validate
slow?
#277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi! Thanks for the report. Haven't gotten a chance to dig in yet, but as a general principle, are you using CPython? If you are, the answer is probably "yes very likely" :). But I'll have a look. |
I also noticed that validation of large files can be time consuming. I am seeing 10-40 seconds (depending on the complexity of the schema) required to validate a json file with 425,384 entries. Using version 2.5.1 distributed on pypi. |
For comparison, the JavaScript library https://github.com/epoberezkin/ajv Perhaps someone knows of a Python library that wraps |
@MichaelCurrie what was the answer to what implementation of Python you are using? Have you tried profiling validation and seeing what's taking so long? I'd definitely accept performance patches that preserve backwards compatibility (and do not make things slower on PyPy). Also see #232 -- I'd love to have actual benchmarks added, it's the only real way to ensure that performance regressions don't happen, at least for the benchmarks in the benchmark suite. |
I am using the CPython implementation of Python:
@Julian do you have an example of how to profile the code in a way that would provide you with useful information? Alternatively, I have already provided my example files and code, so you could also do this profiling if you are interested. I suspect any performance improvements gleaned would have general applicability. Thanks for your help. |
FYI, #158 was an earlier attempt to improve speed. |
2903943b Merge pull request #277 from json-schema-org/non-string-formats 95425df4 Move tests for builtin formats ignoring non-strings upstream. aa71850e Merge pull request #278 from gregsdennis/master 4b638034 updated 2019-09 references to correct schema uri 8656a718 moved 2019-08 tests to 2019-09; fixed a few 2019-06 $refs 0f888a8f Make ajv's tests not fail the build. 3922a3c2 Run the suite sanity checks on Py3. 9eda690f Minor style. git-subtree-dir: json git-subtree-split: 2903943b4c31a33a9dc8b017174deefdf46f5213
I come to you from the tracker-commons open-source project.
Using the following 57 MB JSON file and 7 KB schema file, the following code takes upwards of 70 seconds to run.
This dwarfs all the other processing steps I'm performing on it by a factor of 100. Is there something I'm missing? Is it really supposed to take this long to validate?
Thanks.
The text was updated successfully, but these errors were encountered: