Skip to content

Make better loop performance more intuitive #310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

pawl
Copy link
Contributor

@pawl pawl commented Nov 27, 2016

This PR adds a Schema class to be used like:

schema = Schema({
    "type" : "object",
    "properties" : {
        "price" : {"type" : "number"},
        "name" : {"type" : "string"},
    },
})

schema.validate({"name" : "Eggs", "price" : 34.99})

When the Schema object is initialized, it will check if the schema is valid and will initialize the validator class. This allows you to initialize the validator and check the schema only once, instead of doing it each time validate is called.

The goal is to make it a lot more intuitive for new users to get better performance in loops or web application endpoints without needing to learn about Draft4Validator.validate() and why it's faster. It's also a better alternative to directly using Draft4Validator.validate() because it will still validate the schema.

Here's a gist I was using to benchmark: https://gist.github.com/pawl/382c655f31dad631ab29775a6c8cad05

@pawl
Copy link
Contributor Author

pawl commented Nov 27, 2016

It's failing on this error with building the docs:

Warning, treated as error:
/home/travis/build/Julian/jsonschema/docs/errors.rst:3: WARNING: Didn't find a target for anyOf

Anyone already familiar with what's causing that error?

@pawl pawl force-pushed the add_schema_class branch 2 times, most recently from 8bf6968 to c2c0dc7 Compare November 27, 2016 10:58
@pawl
Copy link
Contributor Author

pawl commented Nov 27, 2016

Had to make 2 changes to make tests pass:

  • It looks like the validation spec html changed and the way the docs parse links out of the validation spec using xpath wasn't working anymore. I fixed the xpath query to make matching work again.
  • Tox was using python 3.2.5 for pypy3 and it said pip isn't compatible with 3.2 anymore. Adding pypy3 to the .travis.yml installed a more recent version of pypy3. And, using tox-travis made each of the 4 travis builds only run tox for the python version relevant to the build.

@Julian
Copy link
Member

Julian commented Nov 27, 2016

Hi! Thanks for the PR!

Cherry picked the docfix immediately, I noticed it yesterday but hadn't had a chance to fix it yet. Appreciated.

The travis change though drops builds (the docs and style ones), so can't take that one -- I'd prefer to just actually install the newer pypy3 as a command (or to pin pip for that build in tox.ini, but knowing how long travis takes to merge in new interpreters that way is probably not the best.)

Same for running coverage on tests -- you specifically do want to run coverage on tests files, it finds test bugs, like ones where you accidentally named two tests the same thing.

As for the actual core thing here -- I'm not sure I like the idea of adding yet a third way to validate things. Can you elaborate on

The goal is to make it a lot more intuitive for new users to get better performance in loops or web application endpoints without needing to learn about Draft4Validator.validate() and why it's faster.

why would a new user find it more difficult to learn that API than this one?

(Thanks again regardless, definitely appreciate the fixes + discussion)

@pawl pawl force-pushed the add_schema_class branch 3 times, most recently from 9544d1d to d80ce25 Compare November 27, 2016 20:11
@pawl
Copy link
Contributor Author

pawl commented Nov 27, 2016

I removed the commit for removing test coverage on the tests and fixed the name of that test. I learned you can made it so tox-travis still builds docs and do style checking in the environments you choose using the [tox:travis] section in the tox.ini. So, my PR is still using tox-travis, and it seems like tests run a lot faster because each environment are running in parallel.

why would a new user find it more difficult to learn that API than this one?

That's based on my own experience. I didn't realize each call of validate() was running check_schema + initializing the validator class until I saw the code. I went looking for a way to avoid that and saw your line in the docs about using Draft4Validator directly (and i'd also need to call Draft4Validator.check_schema() outside of my endpoint). I think a lot of people might not see that note in the docs, so it would be better if only calling check_schema/initalizing once is the most encouraged usage pattern (hence the Schema class to simplify that).

Using Draft4Validator (or the new schema class) directly is a big performance improvement when you're running validate with the same schema multiple times. It took 10 loops of validate X 1000 from taking 4.1 secs to only taking 0.16 secs.

Also, there are a few other popular parsing libraries that have this same sort of intuitive Schema class syntax:

@pawl
Copy link
Contributor Author

pawl commented Nov 27, 2016

Now it's running style checking for both python 2.7 and 3.5, and to get the style checking test to pass in 3.5 I had to make this change: bf1962f

@Julian
Copy link
Member

Julian commented Dec 25, 2016

It seems like the issue then is more about discoverability of where to go after validate and its limitations than the need for a new API. I definitely don't want to add a third way to do things that are already possible unless it's extremely convincing. It seems to me that just updating some documentation is possibly the right way to go here, but keep in mind that I don't think it should be a goal that the frontpage of the documentation lists all possible concerns someone might have. The goal there was to show someone how to do the 80% case, which is exactly what validate tries to solve for.

@pawl pawl closed this Dec 26, 2016
Julian added a commit that referenced this pull request Nov 29, 2019
0f344a69 Merge pull request #313 from leadpony/issue309
46c44747 Replace the control escape \\a with \\t
1ffe03e5 Merge pull request #312 from gregsdennis/master
de004798 better descripttions
eea7f249 arrays have characters too
7c02d06d added unevaluatedProperties test file; resolves #310
1899a5aa Merge pull request #308 from aznan2/master
4a5010b3 Update the version list.
37569b13 issue #307 - made test compatible with draft4
e3087307 issue #307 - removed issue reference from description
e13d3275 issue #307 - removed pound sign from description
a3b9f723 issue #307 - test that oneOf handles missing optional property

git-subtree-dir: json
git-subtree-split: 0f344a698f6657441adf4ebf4ceeacd596683422
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants