-
-
Notifications
You must be signed in to change notification settings - Fork 167
Validation is not strict enough #213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Is there a way to run a restructured text or sphinx linter over it?
|
Easiest is probably to build s small sphinx project and compile it. I haven't had much luck using Sphinx as a library, but maybe in 2.0 it's easier. |
FWIW, I have a small wrapper at https://github.com/anntzer/speedoc that's intended to convert docstrings to man pages on the fly (for use à la |
@anntzer Very nice! |
Double that. I stumbled upon a similar issue by following the dependency chain YCM -> ycmd -> jedi -> numpydoc: jedi tries to extract information about parameters and return values from docstrings. But even if the Parameters section exists, numpydoc raises an exception if, for instance, the Examples sections appears twice -- I got here by using networkx in vim with YCM which has this exact problem in its IMO the solution would be to separate semantic analysis from parsing. Another quicker but hackier solution would be to throw the exceptions lazily, once an erroneous section is accessed (e.g., when trying to access Returns or Yields when both of them are specified -- which is a semantic error, or when trying to access Examples when it appears twice, etc). |
In pandas we implemented stricter validation of docstrings. We defined a set of standards that extend numpydoc, and we enforce them in the CI (gradually, since fixing all the docstrings is a huge amount of work). We mainly have a script that can be used to validate a single docstring and provide detailed information on errors, can be used to validate all the docs (for the CI), and also can generate a JSON report with the errors found. The list of errors we validate can be found here: Some examples of things we validate:
We currently have few things that are pandas specific, but it shouldn't be difficult to make the script 100% generic and move it to numpydoc. From our side we agree that the script could be moved to numpydoc if there is agreement with the rest of the community [1]. The main question is whether there is interest outside of pandas in a stricter standard, and if there can be agreement on what the standard should be. |
Yes from me, I've written and used validators multiple times, mostly for ensuring that all parameters are documented (in proper order).
I think as long as the checks can be turned off via options, like you can with PEP8, we can have all the checks listed above. For example, I have a fairly large codebase that I think will respect all of the above except:
Eventually I'd probably/maybe want it to, but not at first. |
(See also #13 where this was originally brought up five years ago!) |
I'd be very happy to see this tooling centralised and it makes sense to include it within numpydoc. @datapythonista do you intend to submit a PR? |
Really looking forward to having some form of more advanced docstring validation script as part of numpydoc. If you need some help with that @datapythonista please let us know. Otherwise maybe someone from the pandas team would also be interested/available in contributing it? |
Yes, for what I know looks like there is agreement on moving our script to numpydoc. I don't know much how numpydoc works internally, and I don't have time now to check in detail. Does someone have an opinion on whether we should move our code as-is, removing all pandas specific stuff. Or if it makes more sense to integrate it with the rest of numpydoc? |
I'm not too familiar with numpydoc either but I imagine that backward compatibility on Also I think running the doctests from the pandas validation script could be dropped, since pytest does that fine. |
There's also
In From a quick look, |
I use So even though there is some overlap, I suspect that the |
See also PyCQA/pydocstyle#185 for discussion of this static/dynamic difference |
I discussed with @Nurdok about adding the |
Hi @datapythonista, pydocstyle is a static linter and I don't see us adding code import any time soon. I'm sorry if that's problematic for you. Feel free to add any static checks you need to pydocstyle if you'd like. |
Thanks for the info @Nurdok. We'll see what makes sense, but I think it makes sense to start by moving the pandas validation to numpydoc. As I said, at least for pandas obtaining the docstrings in a static way means that we wouldn't be able to validate even 50% of them. |
I was having a look, and see that I'm not sure how we want to run the validation. I can think of few options:
Is people using |
I'm not sure either. But there is little cost to keeping backward compat here, I would go for the second option ( |
The docs suggest that
python -m numpydoc
can be used to validate docstrings. However,numpydoc
reports only the most egregious breakages of the standard.If we want to help users write high quality docstrings, we should provide a stronger checker.
In the following example, only the incorrect header gets caught:
The text was updated successfully, but these errors were encountered: