Skip to content

Unions break with isinstance() #1135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bdarnell opened this issue Jan 17, 2016 · 14 comments
Closed

Unions break with isinstance() #1135

bdarnell opened this issue Jan 17, 2016 · 14 comments
Labels
bug mypy got something wrong

Comments

@bdarnell
Copy link
Contributor

Mypy incorrectly reports two errors in this (python 2) snippet.

$ nl -ba /tmp/foo.py
     1  import typing
     2  
     3  def foo(value):
     4      # type: (typing.Union[unicode, bytes]) -> None
     5      if isinstance(value, unicode):
     6          return
     7      if not isinstance(value, bytes):
     8          raise TypeError("Expected bytes or unicode; got %r" % type(value))
$ PYTHONPATH=.  scripts/mypy --py2 /tmp/foo.py
/tmp/foo.py: note: In function "foo":
/tmp/foo.py:7: error: Argument 1 to "isinstance" has incompatible type "Union[object, object]"; expected "object"
/tmp/foo.py:8: error: No overload variant of "type" matches argument types [Union[<ERROR>, <ERROR>, <ERROR>, <ERROR>]]

If lines 5 and 6 are removed, the errors disappear. If line 7 is replaced with if True:, the error on line 8 remains. Replacing unicode with str and running mypy in py3 mode also makes the errors disappear.

@refi64
Copy link
Contributor

refi64 commented Jan 17, 2016

This is quite possibly one of the most bizarre error messages I've ever seen...

bdarnell added a commit to bdarnell/tornado that referenced this issue Jan 17, 2016
Passes on py3 except for a mypy bug (python/mypy#1135).
Still some failures on py2.
@gvanrossum
Copy link
Member

gvanrossum commented Jan 17, 2016 via email

@gvanrossum
Copy link
Member

gvanrossum commented Jan 17, 2016 via email

@bdarnell
Copy link
Contributor Author

AnyStr is undocumented; is it meant for public consumption?

As a TypeVar, AnyStr has somewhat different semantics from a Union[unicode, bytes]: multiple AnyStrs in the same signature must either all be unicode or all be bytes; multiple unions can mix types independently. In Tornado I generally want the union behavior, not the typevar (at least I think so - I've only type-hinted one function so far that has multiple unicode/bytes arguments).

This also shows up in the typeshed stubs for the 2.7 re module. Mixing unicode and bytes in a way that works in python 2 will fail to type check (often with a very cryptic message about object). And since one way to mix unicode and bytes in python 2 is with string literals, the following code is valid in python 2 and 3 but won't type check:

import re
import sys

PY3 = sys.version_info >= (3,)

if PY3:
    unicode = str

p = re.compile('a')

def f(s):
    # type: (unicode) -> unicode
    return p.sub('b', s)

print(f(u'\u00e9'))

So I guess this raises a kind of philosophical question about whether the type checker should try to allow everything that would actually work in python 2, or if it should be a "strict mode" that forces code like the above to be more explicit about its unicode handling.

@gvanrossum
Copy link
Member

AnyStr is meant to be public -- it's documented in PEP 484. I have to think
about the rest some more, maybe @JukkaL has something to say about this
topic too.

On Sat, Jan 16, 2016 at 11:01 PM, Ben Darnell [email protected]
wrote:

AnyStr is undocumented https://docs.python.org/3/library/typing.html;
is it meant for public consumption?

As a TypeVar, AnyStr has somewhat different semantics from a Union[unicode,
bytes]: multiple AnyStrs in the same signature must either all be unicode
or all be bytes; multiple unions can mix types independently. In Tornado I
generally want the union behavior, not the typevar (at least I think so -
I've only type-hinted one function so far that has multiple unicode/bytes
arguments).

This also shows up in the typeshed stubs for the 2.7 re module. Mixing
unicode and bytes in a way that works in python 2 will fail to type check
(often with a very cryptic message about object). And since one way to
mix unicode and bytes in python 2 is with string literals, the following
code is valid in python 2 and 3 but won't type check:

import re
import sys

PY3 = sys.version_info >= (3,)

if PY3:
unicode = str

p = re.compile('a')

def f(s):
# type: (unicode) -> unicode
return p.sub('b', s)

print(f(u'\u00e9'))

So I guess this raises a kind of philosophical question about whether the
type checker should try to allow everything that would actually work in
python 2, or if it should be a "strict mode" that forces code like the
above to be more explicit about its unicode handling.


Reply to this email directly or view it on GitHub
#1135 (comment).

--Guido van Rossum (python.org/~guido)

@JukkaL
Copy link
Collaborator

JukkaL commented Jan 17, 2016

Obviously mypy is not type checking the original example correctly. I'll look into it.

The intention is that str is a subtype of unicode in Python 2 mode and bytes is an alias of str. Thus Union[bytes, unicode] should be the same type as unicode.

@bdarnell
Copy link
Contributor Author

In python 2, str and unicode are not sub/supertypes, they're both subtypes of basestring. It shouldn't work to collapse Union[bytes, unicode] to unicode.

@gvanrossum
Copy link
Member

Heh, so maybe we should remove the str->unicode entry from the type
promotions list. But that still leaves the bug in the type promotion code
that I demonstrated with int/float. Even if it is unnecessary to use a
Union of two types related by promotion, it shouldn't give those ridiculous
errors.

On Sun, Jan 17, 2016 at 10:01 AM, Ben Darnell [email protected]
wrote:

In python 2, str and unicode are not sub/supertypes, they're both
subtypes of basestring. It shouldn't work to collapse Union[bytes,
unicode] to unicode.


Reply to this email directly or view it on GitHub
#1135 (comment).

--Guido van Rossum (python.org/~guido)

@JukkaL JukkaL added the bug mypy got something wrong label Jan 18, 2016
@JukkaL
Copy link
Collaborator

JukkaL commented Jan 18, 2016

I wouldn't remove the str -> unicode entry yet from the type promotions list. We should first agree on how we expect the string types to be used in Python 2 code.

There are basically at least four ways we can approach this:

  1. Make str usually valid when unicode is expected. This is how mypy currently is supposed to work. This will correspond to runtime semantics, but it's not safe as non-ascii characters in str objects will result in programs sometimes blowing up. A 7-bit str instance is almost always valid at runtime when unicode is expected.
  2. Get rid of the str -> unicode promotion and use Union[str, unicode] everywhere (or create an alias for it). This is almost like approach 1, except that we have a different name for unicode and more complex error messages and a complex programming model due to the proliferation of union types. There is potential for some additional type safety by using just unicode in user code.
  3. Enforce explicit str / unicode distinction in Python 2 code, similar to Python 3 (str would behave more or less like Python 3 bytes), and discourage union types. This will make it harder to annotate existing Python 2 programs which sometimes use the two types almost interchangeably, but it will make programs safer.
  4. Have three different string types: bytes (distinct from from str) means 8-bit str instances -- these aren't compatible with unicode. str means ascii str instances. These are compatible with bytes and unicode, but not the other way around. unicode means unicode instances and isn't special. A string literal will have implicit type str or bytes depending on whether it only has ascii characters. This approach should be pretty safe and potentially also makes it fairly easy to adapt existing code, but harder than with approach 1.

These also affect how stubs should be written:

  • For approach 1, stubs should usually use str or unicode.
  • For approach 2, stubs should use str, Uniont[str, unicode] or AnyStr for attributes and function arguments, and return types could additionally use plain unicode. Return types would in general be hard to specify precisely, as it may be difficult to predict whether a function called with str or combination of str and unicode returns str, unicode or Union[str, unicode]. In approach 1 we can safely fall back to unicode if unsure.
  • For approach 2, stubs would usually use either str, unicode or AnyStr, but unicode wouldn't accept plain str objects.
  • For approach 3, stubs could use three different types (bytes, str, unicode) in addition to AnyStr, and these would all behave differently. Unlike the first three approaches, AnyStr would range over str, unicode and bytes in Python 2 mode.

Note that mypy currently assumes approach 1 and I don't know how well the other approaches would work in practice.

EDIT: "blowing up" -> "sometimes blowing up"

@JukkaL
Copy link
Collaborator

JukkaL commented Jan 18, 2016

Here's a slightly simplified Python 3 fragment that reproduces the problem:

from typing import Union

def f(x: Union[float, int]) -> None:
    if isinstance(x, float):
        pass
    if not isinstance(x, int):
        f(x)

Output:

t2.py: note: In function "f":
t2.py:5: error: Argument 1 to "isinstance" has incompatible type "Union[float, object, object]"; expected "object"
t2.py:6: error: Argument 1 to "f" has incompatible type union type (5 items); expected "Union[float, int]"

This seems to be an interaction between Union, special type promotions and isinstance.

@JukkaL
Copy link
Collaborator

JukkaL commented Jan 18, 2016

Actually, this is not related to special built-in type subtyping rules. This also fails:

from typing import Union
class A: pass
class B(A): pass

def f(x: Union[A, B]) -> None:
    if isinstance(x, A):
        pass
    if not isinstance(x, B):
        f(x)

Output:

t2.py: note: In function "f":
t2.py:8: error: Argument 1 to "isinstance" has incompatible type "Union[A, object, object]"; expected "object"
t2.py:9: error: Argument 1 to "f" has incompatible type union type (5 items); expected "Union[A, B]"

We should simplify union types where one type is a subtype of another type. It seems to be that we assume that union types are simplified somewhere in the code and if this is not the case, things break.

@JukkaL JukkaL closed this as completed in 2e65c1c Jan 18, 2016
@JukkaL
Copy link
Collaborator

JukkaL commented Jan 18, 2016

This was pretty bizarre through some complex interactions and fun to track down. Here's a synopsis:

  • Union types didn't get simplified when comparing types for sameness. Thus Union[int, float] and float weren't considered the same type (which they logically are). This was the root cause.
  • The above resulted in a type 'diff' algorithm to produce void types which were stored in unions. Normally we'd check types for sameness and skip the diff algorithm if the result would be void, but we incorrectly moved on and happily produced these empty types.
  • These empty types got stored in unions, even though unions should never have void types.
  • Operations on these bad unions resulted in more bad unions with 'error types'. These got printed out in error messages, and they also may have confused type inference a bit.

We should still add some defensive assertions to make problems like this easier to figure out.

@bdarnell
Copy link
Contributor Author

Regarding the "four ways" comment above (what's the best place to continue this discussion? a new issue?): Tornado uses a different (fifth?) way that uses more types, but I have found them all to be necessary in writing code that spans python 2 and 3.

The types (and/or type conversion functions; it's a little fuzzy in the pre-type-annotation world) are:

  • unicode_type: str in python 3, unicode in python 2
  • bytes: self-explanatory
  • str: the type named str in all python versions, so bytes in python 2 and unicode in python 3; I also refer to this as native_str.
  • basestring_type: basestring or Union(unicode, bytes) in python 2, str in python 3. This is an optimization; nearly everywhere I have basestring_type I could convert to unicode_type with equivalent semantics but (possibly) additional conversions.
  • Union[bytes, unicode]: I really wish py33 had reintroduced the unicode builtin alias for str along with u"" so this could have a consistent spelling across python versions (as opposed to the TypeVar AnyStr)

Most functions try to be permissive in their inputs and accept Union[bytes, unicode] (via the functions utf8 and to_unicode, which are the ones that led to this bug report). Some functions only accept bytes. I don't think I have any functions that only accept character strings, but if I wanted such a thing I'd use basestring_type instead of unicode_type (option 1 makes unicode effectively equivalent to basestring, but it doesn't give a py2/py3-compatible spelling for this concept). Outputs are generally defined as one of str, bytes, or unicode (independent of the input type, so a TypeVar is inappropriate), rarely basestring_type or a union.

I'm going to be annotating most of my arguments with Union[bytes, unicode] so I can use the same annotations in py2 and py3 and continue accepting mixed byte and character strings. This suggests option 2, but option 1 feels like it might be more inline with what non-py3-compatible py2 code might expect.

@JukkaL
Copy link
Collaborator

JukkaL commented Jan 21, 2016

I created a new issue #1141 for str/unicode; let's move the discussion there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug mypy got something wrong
Projects
None yet
Development

No branches or pull requests

4 participants