Skip to content

Sibling subclasses (without stubs) pass for each other? #3987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mbforbes opened this issue Sep 22, 2017 · 14 comments
Closed

Sibling subclasses (without stubs) pass for each other? #3987

mbforbes opened this issue Sep 22, 2017 · 14 comments

Comments

@mbforbes
Copy link

In what is perhaps a fool's errand, I'm hoping to use mypy to help typecheck code written using a big library without stubs (pytorch).

mypy seems to know something about the types of the variables, but not quite enough to be helpful. I'm hoping to understand why the below happens and if there's anything I can do about it.

Code:

import torch

i = torch.IntTensor()
f = torch.FloatTensor()
reveal_type(i)
reveal_type(f)

def foo(i: torch.IntTensor, f: torch.FloatTensor) -> None:
    pass

# these typecheck as expected
foo(i, f)  # OK
foo(1, 2)  # error

# these typecheck OK, but I think should error
foo(i, i)
foo(f, f)

Output of mypy --follow-imports=silent:

5: error: Revealed type is 'torch.IntTensor'
6: error: Revealed type is 'torch.FloatTensor'
13: error: Argument 1 to "foo" has incompatible type "int"; expected "IntTensor"
13: error: Argument 2 to "foo" has incompatible type "int"; expected "FloatTensor"

I am sorry if this is because of my confusion with python or mypy in general.


More info:

The definitions of the above classes:

class FloatTensor(_C.FloatTensorBase, _TensorBase):

    def is_signed(self):
        return True

    @classmethod
    def storage_type(cls):
        return FloatStorage

class IntTensor(_C.IntTensorBase, _TensorBase):

    def is_signed(self):
        return True

    @classmethod
    def storage_type(cls):
        return IntStorage
@gvanrossum
Copy link
Member

How are you running mypy? In particular are you using --follow-imports=silent? Maybe the base classes (*TensorBase) aren't found.

@mbforbes
Copy link
Author

The first output was with mypy --follow-imports=silent, and the gist has the full output.

It looks like there are a lot of problems with things that look like *TensorBase:

/Users/max/.pyenv/versions/rndj1/lib/python3.6/site-packages/torch/__init__.py:316: error: Name 'DoubleTensorBase' is not defined
/Users/max/.pyenv/versions/rndj1/lib/python3.6/site-packages/torch/__init__.py:317: error: Name 'FloatTensorBase' is not defined
/Users/max/.pyenv/versions/rndj1/lib/python3.6/site-packages/torch/__init__.py:318: error: Name 'LongTensorBase' is not defined
/Users/max/.pyenv/versions/rndj1/lib/python3.6/site-packages/torch/__init__.py:319: error: Name 'IntTensorBase' is not defined
/Users/max/.pyenv/versions/rndj1/lib/python3.6/site-packages/torch/__init__.py:320: error: Name 'ShortTensorBase' is not defined
/Users/max/.pyenv/versions/rndj1/lib/python3.6/site-packages/torch/__init__.py:321: error: Name 'CharTensorBase' is not defined
/Users/max/.pyenv/versions/rndj1/lib/python3.6/site-packages/torch/__init__.py:322: error: Name 'ByteTensorBase' is not defined
/Users/max/.pyenv/versions/rndj1/lib/python3.6/site-packages/torch/__init__.py:324: error: Name 'SparseDoubleTensorBase' is not defined
/Users/max/.pyenv/versions/rndj1/lib/python3.6/site-packages/torch/__init__.py:325: error: Name 'SparseFloatTensorBase' is not defined
/Users/max/.pyenv/versions/rndj1/lib/python3.6/site-packages/torch/__init__.py:326: error: Name 'SparseLongTensorBase' is not defined
/Users/max/.pyenv/versions/rndj1/lib/python3.6/site-packages/torch/__init__.py:327: error: Name 'SparseIntTensorBase' is not defined
/Users/max/.pyenv/versions/rndj1/lib/python3.6/site-packages/torch/__init__.py:328: error: Name 'SparseShortTensorBase' is not defined
/Users/max/.pyenv/versions/rndj1/lib/python3.6/site-packages/torch/__init__.py:329: error: Name 'SparseCharTensorBase' is not defined
/Users/max/.pyenv/versions/rndj1/lib/python3.6/site-packages/torch/__init__.py:330: error: Name 'SparseByteTensorBase' is not defined

and

/Users/max/.pyenv/versions/rndj1/lib/python3.6/site-packages/torch/tensor.py:377: error: Type[_TensorBase] has no attribute "type"
/Users/max/.pyenv/versions/rndj1/lib/python3.6/site-packages/torch/tensor.py:378: error: Type[_TensorBase] has no attribute "cuda"

If it doesn't know enough about their base class(es), would mypy assume that they could be compatible?

@gvanrossum
Copy link
Member

Yeah, that's likely what's causing this. Try experimenting with a small example.

@mbforbes
Copy link
Author

I experimented a bit but couldn't determine more than the above.

The surface question I had was whether mypy assumes that objects with somewhat ambiguous base classes are compatible. The answer to this seems to be "yes." It would be nice if reveal_type(...) was able to display its understanding of inheritance hierarchy of a type. That would help me debug this issue.

The deeper question I had was how I can fix this.

I was hoping that there's a way to gradually add type annotations without undergoing the behemoth task of documenting types for a big (and rapidly changing) 3rd party library like pytorch.

I checked the docs and the wiki sections about pyi files, as well as a few related issues (#3505, #2502, #3930, #84), and I couldn't determine whether there is a solution for gradually introducing type annotations into a 3rd party library. My impression is that any pyi file I introduce overrides the corresponding py file, and there isn't an "augmentation" mode.

I guess the tl;dr is: For really big projects, mypy gets a lot of it right, but some of it wrong. It would be nice to be able to tell (a) how it's getting things wrong, (b) fix these problems when they arise without trying to document big swaths of the project. I'm hoping to ask if either of these are possible!

@gvanrossum
Copy link
Member

Could you run stubgen over pytorch?

Agreed that for really big projects it's a commitment. Zulip, approximately 100k LoC, did it over a summer with one dedicated GSoC student.

At Dropbox we have maybe 5M LoC, or which approximately 20% is annotated. It's taken us a year and a half to get there, with a dedicated team of several (of course we also had to improve mypy itself a lot to get there, so you can do it for less :-).

mbforbes added a commit to mbforbes/rndjam1 that referenced this issue Sep 26, 2017
@mbforbes
Copy link
Author

stubgen seems promising!

Running vanilla stubgen torch produces a single out/torch/__init__.pyi file, which has a few top-level types defined, but is missing types from all of the inner modules (apologies if I butcher the terminology here).

To try to get the remaining modules, I ran stubgen --recursive torch , which generated many more files, but also left me with a few confusing errors.

Running on the code I posted in the top of this issue:

$ mypy mypy_minimal_example.py
out/torch/__init__.pyi:5: error: Cannot find module named '_C'
out/torch/__init__.pyi:5: note: (Perhaps setting MYPYPATH or using the "--ignore-missing-imports" flag would help)
out/torch/_C.pyi:2691: error: invalid syntax

I'm guessing that the ordering of the errors is simply the confusing part here, and it can't find _C because there was a syntax error in it. So ignoring that problem, and looking at the syntax on _C.pyi:2691:

stubgen-syntax-error

I'm not sure what the syntax error is here? (And is it a bug if stubgen generated files with syntax errors?)

@ilevkivskyi
Copy link
Member

@mbforbes Can you try this with the stubgen from master? There were several improvements recently.

@mbforbes
Copy link
Author

@ilevkivskyi thanks for the quick tip! Running from master removes the first error, but not the supposed syntax error:

$ mypy --version
mypy 0.530-dev-c332ea08b542728985fadcbee1db6651060c3876
$ mypy mypy_minimal_example.py
out/torch/_C.pyi:2691: error: invalid syntax

Line 2691 is the same as above. (I am happy to provide more info if that would be helpful.)

@ilevkivskyi
Copy link
Member

Ah, from from is a reserved keyword in Python, you can't use it as an argument name.

@ilevkivskyi
Copy link
Member

@mbforbes If this is a stub for C extension module (my guess), I think you could just rename the argument to from_.

@mbforbes
Copy link
Author

I was thinking that might be the issue... that is a little bothersome because it's a 3rd party repository, and they do label that as the paramter name; I assume since python runs their code, they're using a C extension; in that case, could / should stubgen replace keywords like from with from_ automatically?

Fixing the fromfrom_ problems, another issue arises that they have non-default arguments after default ones:

...
def addbmm(beta=1, mat, alpha=1, batch1, batch2, out=None): ...
def addcdiv(tensor, value=1, tensor1, tensor2, out=None): ...
def addcmul(tensor, value=1, tensor1, tensor2, out=None): ...
...

At this point I'm actually not sure what side of the fence this problem lands on. On one hand, this is code that runs, so it seems like stubgen ought to be robust to it. And I can see the draw from (their) API perspective to mix default and non-default arguments like this. On the other hand, maybe it's not stubgen's responsibility to fix up syntactically incorrect Python interfaces.

@gvanrossum
Copy link
Member

they do label that as the paramter name; I assume since python runs their code, they're using a C extension; in that case, could / should stubgen replace keywords like from with from_ automatically?

Just because their docs show uniform_(from=0, to=1) doesn't mean you can call it with keyword args. It's clearly meant as positional arguments and they just use that notation to show the default and be able to talk about it in the description.

non-default arguments after default ones

Again I think you are taking their docs a bit too literally.

this is code that runs,

You keep using that phrase, but clearly the docs don't run, and the implementation isn't written in Python. So keep it with a grain of salt.

so it seems like stubgen ought to be robust to it

Please understand that stubgen is a tool to help you create stub files, but it's not as robust as mypy itself, and it isn't maintained that much. You're supposed to manually review the stubs (as well as test them) before you start using them.

If you find a bug in stubgen you're welcome to report an issue but we may not prioritize fixing it. If you submit a PR we may review and merge it, assuming you're not making sweeping code changes (try to refrain from a big refactor).

@mbforbes
Copy link
Author

Thanks for the insight into what's going on here. I think I'm still new enough to C extensions that I have a flawed understanding of how the docs / C / Python interact.

I went down the path of fixing some more of the problems mypy found in what stubgen created from torch, but I quickly realized this was going to become exactly the undertaking I was hoping to avoid. In order to get some of the fundamental types (e.g., torch.cuda.FloatTensor), it seems that those modules (cuda) need to have stubs without errors. Those stubs have a cacophony of errors, and import numpy. By the time I'd started generating and fixing stubs for numpy I realized I'd gone a bit too far.

I greatly appreciate the help above. I'm going to open one more issue that I've had in trying to go back to running stub-less, but I totally get it if I'm treading in unsupported waters at this point.

@ilevkivskyi
Copy link
Member

@mbforbes Note that there are experimental stubs for numpy here, but real support for numpy will require a plugin, see #3540.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants