Skip to content
This repository was archived by the owner on Nov 3, 2023. It is now read-only.

Check __doc__ assignments #185

Open
jayvdb opened this issue Apr 30, 2016 · 11 comments
Open

Check __doc__ assignments #185

jayvdb opened this issue Apr 30, 2016 · 11 comments

Comments

@jayvdb
Copy link
Member

jayvdb commented Apr 30, 2016

Is it worth checking runtime assignments to __doc__?

One obscure case which could be an error is explicit setting of __doc__ to a literal where an implicit docstring would suffice.

@Nurdok
Copy link
Member

Nurdok commented May 3, 2016

It would be a nightmare to parse and understand assignments. Since we're not importing the code, we would only be able to deal with extremely basic cases. Do you have a use case which justifies this?

@jayvdb
Copy link
Member Author

jayvdb commented May 3, 2016

My idea was that it was the extremely basic cases where pydocstyle can require that the __doc__ = ... is removed as it is unnecessary, and it bypasses pydocstyle checking. e.g. https://github.com/python/cpython/blob/master/Lib/hashlib.py#L5 and many instances in http://bazaar.launchpad.net/~bzr-pqm/bzr/bzr.dev/view/head:/bzrlib/builtins.py (and bzr's mail_client.py and others)

I guess the crux of this issue is whether pydocstyle considers __doc__ to be docstrings and are in scope, or not. pep257 says a string at the beginning becomes a __doc__ attribute, but it doesnt say the inverse (that all __doc__ are docstrings), but the inverse is heavily backed into the implementation.

Then the question is whether it is useful to allow __doc__ = ... to be used as a way to bypass pydocstyle.

@Nurdok
Copy link
Member

Nurdok commented May 3, 2016

This case would just generate a "Missing Docstring" error, which seems enough to me - it will point you in the right direction.

@larsoner
Copy link
Contributor

Since we're not importing the code, we would only be able to deal with extremely basic cases

Why not import the code? It seems like it would solve the parsing issues?

We have such a use case. We have many function/method pairs, where the method docstring is identical to that of the function (but self gets passed as the first argument). We have made a decorator copy_function_doc_to_method_doc that keeps the documentation DRY.

@jayvdb
Copy link
Member Author

jayvdb commented Aug 31, 2016

Executing the code could have side-effects. Even importing and not executing the code can have side effects.
It is possible to run small chunks of code safely by inspecting the node types and/or using eval(), but invoking a method decorator safely isnt worth trying to do.

There are a few alternatives that might work for your scenario. Is your reoo public so I can look for viable options?

@larsoner
Copy link
Contributor

The decorator is here (which is quite a few lines):

https://github.com/mne-tools/mne-python/blob/master/mne/utils.py#L860

But we also do things like function.__doc__ = _doc_template.format(...) in multiple places, too:

https://github.com/mne-tools/mne-python/blob/master/mne/datasets/testing/_testing.py#L29

I don't think it's reasonable to have pydocstyle try to parse and handle all of these sorts of cases, but fortunately #203 gives us an effective workaround in our repo.

Executing the code could have side-effects.

I'm not proposing to execute the code, but rather do some __import__ (or importlib equivalent) calls, followed by fun.__doc__ pulling. I don't think you'd need to do any eval() or anything to get this to work.

AFAIK other widely used introspection tools like IPython and Spyder get the docstrings this way (they at least parse our wacky docstrings properly), so there is at least some reasonable precedent for doing it this way without too horrible of side effects.

Even importing and not executing the code can have side effects.

This is true, it can, but my guess is the number of repos that suffer from this issue is quite small. What if it were made optional via --import or import=True in the config -- it can even continue to default to False for completeness/backward compat. That way users who know their repo is safe to import can get this added benefit. It makes the code here a bit more complicated, but 1) it might only increase the number of lines by a couple dozen, and 2) for the (hopefully vanishingly small minority of) repos it fails to work properly, users can still use import=False.

@sigmavirus24
Copy link
Member

I'm not proposing to execute the code, but rather do some import (or importlib equivalent) calls, followed by fun.doc pulling. I don't think you'd need to do any eval() or anything to get this to work.

What exactly do you think happens when you use importlib or __import__?

AFAIK other widely used introspection tools like IPython and Spyder

Neither of those are static analysis tools (which is what pydocstyle is). They're both tools (REPL/Environment/what have you) that are already running Python code and already have users import your project. That's not how users tend to expect a tool like pydocstyle to work. Further, that just adds to our runtime because importing one file, as I'm sure you're aware, can lead to importing other files just through that one file's import statements. This will only slow down execution for a use-case that seems to be not super common.

@larsoner
Copy link
Contributor

What exactly do you think happens when you use importlib or import?

Yes it executes code. By this I mean that it's a simple implementation to make via import methods, as opposed to manually parsing .py files, figuring out what to call with exec, and hoping it all works properly -- which is what seemed to be proposed by @jayvdb. Otherwise I'm not sure what he meant by "Even importing and not executing the code can have side effects"...?

Neither of those are static analysis tools (which is what pydocstyle is).

I hadn't considered that. It sounds like you find the disadvantages of dynamic analysis not to be worth it, which is fine. But why not allow users the ability to opt into it? For devs with dynamic docstrings, the ~100 ms or so to import the package (especially when the pydocstyle parsing itself takes 10s of seconds) is a trivial tradeoff to make for enhanced functionality. I think it would have minimal maintenance overhead at the pydocstyle end, too, but maybe I underestimate the complexity. (I'm happy to make a PR to find out if it might fly.)

... That's not how users tend to expect a tool like pydocstyle to work.

Hmm, I'm not sure. I for one expected pydocstyle to inspect the docstrings in the ways they are most commonly experienced by users of my code. For me that's in IPython, Spyder, and auto-documentation tools like Sphinx, all of which properly interact with these dynamic forms.

@larsoner
Copy link
Contributor

Note that what I'm talking about would also take care of things like #163, so @sigmavirus24 I suspect this functionality would be more useful than you give it credit for.

@sigmavirus24
Copy link
Member

No #163 is asking to use pydocstyle like a library as in "Here's a docstring. Analyze it for me"

I for one expected pydocstyle to inspect the docstrings in the ways they are most commonly experienced by users of my code. For me that's in IPython, Spyder, and auto-documentation tools like Sphinx, all of which properly interact with these dynamic forms.

Okay, I should rephrase. People who're familiar with tools like eslint, rubocop, pylint, mccabe, pyflakes, pycodestyle, flake8, bandit, etc. all are not expecting their code to be executed.

why not allow users the ability to opt into it?

Because someone still has to maintain it. The current maintainers of this project don't have that kind of time and so it seems best to avoid the complexity until there are people who can afford to deal with the bugs this would cause.

@Nurdok
Copy link
Member

Nurdok commented Sep 1, 2016

First of all, pydocstyle must parse Python code, since most of the docstring conventions are related to code and not the generated __doc__ attribute (which kind of quotes, how many spaces before and after the docstring, etc). Also, having a "correct" __doc__ doesn't necessarily mean that your code is readable. For example:

def foo():
    return

foo.__doc__ = "Do foo."

... is not readable, which is the purpose of coding conventions to begin with. Obviously there are cases where assigning with __doc__ is necessary - for example, using the @wraps decorator. __doc__ manipulation is often done in decorators and since we do parse decorators, I have been thinking about allowing to define decorator names that make pydocstyle skip a function or class.

As I said, pydocstyle must parse Python code. This is difficult enough as it is. Introducing dynamic analysis is too much of a burden. It is also, as @sigmavirus24 suggested, quite an unexpected behavior from a linter. I doubt pydocstyle will ever do dynamic analysis.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants