-
Notifications
You must be signed in to change notification settings - Fork 51
Add generic hook for validating all known schemas #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm not sure I follow. Why does it not scale well? If you were to use all of the currently supported hooks (other than the generic one) in a single config, it would be: - repo: https://github.com/sirosen/check-jsonschema
rev: 0.10.2
hooks:
- id: check-github-workflows
- id: check-github-actions
- id: check-azure-pipelines
- id: check-travis
- id: check-readthedocs
- id: check-renovate Somehow the use case you have in mind doesn't match my expectations, since you say "10+ files" and I've made a list of only 6 hooks here.
This is actually on purpose, but perhaps needs a better explanation. There's example usage in the README: The If there's a way that this can be made clearer, let's improve it. Or if there's a better way you see of supporting this kind of usage, I'm open to changing the behavior!
I think this is a misunderstanding about how that option works. It takes a parameter, which is the name of the schema to use. (Aside: I'd like to switch this to finding the names at runtime, so they can be shown in helptext as I sort of understand the desire for a The python code "doesn't know" anything today about which schema goes with which files. That information is only encoded here in the hooks. I'm hesitant to change that because it keeps the python so much simpler. Right now there's no code for handling globbing or regexes for file matching: I'm able to rely on pre-commit to provide all of that. If there's a behavior which can be implemented without recreating too much of either schemastore or pre-commit, I need more clarity/information before I can commit to anything. |
I do have some ideas about how this can be achieved and likely it would involve creating another python project that encapsulates most of schemastore for offline use, including file matching patterns. Regarding filematching patterns it happens that I already have something similar which is using wcmatch to determine file types at https://github.com/ansible-community/ansible-lint/blob/0837dce8466331dd8e3996dd0785348756fd4370/src/ansiblelint/config.py#L42 -- that list is even user overridable so they can add they own if needed. Let me know if you find this idea interesting as I may be able to spend few hours over the weekend and produce that behemoth python package. That would also enable offline usage. |
If you are interested in working on a pip-installable copy of I still have concerns (mostly performance) about a behavior which reads the whole catalog. I don't want to promise that I would include such a feature, at least not in the near term. I would need to have a high level of confidence that we can keep the packaged copy of the schemas up to date. This may also be worth raising with the SchemaStore maintainers. They're very open to contributions to the schemas themselves, so perhaps they'd be willing to host |
In the weeks since this was first discussed, I've added a daily job to check the vendored schemas for updates. It seems to be working nicely. At this point, I'm not really likely to add a dependency on another package to provide the schemas unless it gets some significant traction. I figure I should mention this before anyone puts in significant effort on this front. |
How about adding a config file with mappings, maybe even compatible with https://github.com/SchemaStore/schemastore/blob/master/src/api/json/catalog.json -- that should allow you to make it work like a linter, where you can feed it json files w/o having to mention their schemas, as that would be determined based on config. PS. I did not had time to progress on bundling schemastore, especially as I hit some problems like python-jsonschema/jsonschema#931 which hindered my progression. |
Right now, this issue is on a course towards being closed as a wontfix. I need information to justify doing work on it, or enough information to know that the use-case is satisfied well enough by other features. The primary issue is that I still do not understand what the use-case is and why a list of hooks like - repo: https://github.com/python-jsonschema/check-jsonschema
rev: 0.14.2
hooks:
- id: check-github-workflows
- id: check-readthedocs
- id: check-travis
- ... isn't sufficient or this use-case. Can you articulate why that would not work for pre-commit usage, or what the CLI usage is that you would like to see which isn't supported today? |
You might want to know that yesterday I released first version of schemastore python bundle package, with over 500 schemas in it. That could be used to validate schemas offline. |
Absent the information necessary to justify accepting this issue, I'm closing as a wontfix. |
By looking at https://github.com/sirosen/check-jsonschema/blob/main/.pre-commit-hooks.yaml I realise that the currently provided set of hooks does not scale well on real-life repositories, where you might easily have more than 10+ files using schemas.
I would like to suggest a generic option that would detect all files that match known schemas and validate them, with an option to fallback to offline more, so it would also work in locked down environments like https://pre-commit.ci
As far users follow official file naming patterns, it should not be a problem to do that and we might avoid having to add tons of hook definitions and to have to maintain them.
The current set of builtin schemas is already good but we can easily add more in time.
I may worth remarking that the first hook definition from https://github.com/sirosen/check-jsonschema/blob/main/.pre-commit-hooks.yaml is broken as it would not work when used bare and thow an error for each encountered file:
IMHO, what I describe here would allow this first hook to be used without having to specify any file pattern or schemas to be used.
While testing
--builtin-schema
I encountered two unexpected errors:My expectation was that a file that does not match a schema association pattern should just not be validated, aka skipped.
The text was updated successfully, but these errors were encountered: