Skip to content

Moving Protocols from typing_extensions into typing #550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gvanrossum opened this issue Apr 3, 2018 · 19 comments
Closed

Moving Protocols from typing_extensions into typing #550

gvanrossum opened this issue Apr 3, 2018 · 19 comments

Comments

@gvanrossum
Copy link
Member

This is a placeholder issue so we can have the discussion from #549 outside the context of a specific PR. Relevant comments:

@ilevkivskyi, if you don't mind splitting off the debate, I will write my response to the last bullet below here.

@gvanrossum
Copy link
Member Author

We had a discussion on Slack with Jared Hance about what to do with circular imports necessary only for type checking and Bazel that doesn't like circular imports (even under if MYPY: ...). He reasonably proposed that all such cycles should be broken by introducing protocols for the function/method argument types where necessary. After thinking some time about this it seems a very good idea to me: modular (e.g. in the sense of having well defined Bazel targets) static type checking will require protocols. Then since many argument types for functions will have some data components, support for non-method members is needed.

I'm not sure that a Protocol is needed for this case. In most cases (assuming these are actual classes or named tuples, not dicts) there are specific class definitions that are used everywhere (though not named everywhere explicitly), so apart from the circular import problem, nominal type checking is fine . So the refactoring needed could just as well introduce an ABC. I'll elaborate below.

The question about why we need the runtime support is a bit more subtle (also see my answer below). The problem will happen if someone needs to transition from one API to another. Imagine this (simplified) scenario:

class LegacyData(Protocol):
    x: int
    y: int
    label: str

class NewData(Protocol):
    coord: Tuple[int, int]
    label: str

def process(items: Iterable[Union[LegacyData, NewData]]) -> None:
    for item in items:
        if <some check here>:
             # unpack and process legacy API
        else:
             # other logic for new API

IMO ideally we should provide a way to destructure a union of protocols (otherwise we would induce growing amount of rigidity in any protocol specified APIs) so the check above can be just isinstace(item, LegacyData). Of course one can use some hasattr() hacks instead, but then this function will be not statically checked.

Yeah, the main reason to use a protocol in this case would be that it's an easy way to spell a bunch of hasattr() checks. But I don't understand what you mean by "destructuring a union of protocols" -- is there some reason I'm not seeing that <some check here> cannot be isinstance(item, LegacyData)?


Now let me explain why I think ABCs would be good enough to deal with the typical circular import case. (Though see farther down for a concern.) Suppose we have two unannotated modules like this:

  • b.py
from c import C
class B:
    def meth(self):
        return C(self)
  • c.py
class C:
    def __init__(self, arg):
        self.arg = arg  # This is always a B instance

and let's suppose these are in two different Bazel packages, b and c. Package b clearly imports on package c so that's expressed in the Bazel BUILD file as a dependency:

  • c depends on nothing
  • b depends on c

When we start adding annotations, we wish to add from b import B to the c package but Bazel won't let us. So we introduce a new package a, containing two ABCs (I'm leaving out the ABC infrastructure to keep the code short):

  • a.py
class AC:
    pass
class AB:
    def meth(self) -> AC:
        raise NotImplementedError

We then import a from both b.py and c.py and update the classes:

  • b.py
from a import AB
from c import C
class B(AB):
    def meth(self) -> C:
        return C()
  • c.py
from a import AB, AC
class C(AC):
    def __init__(self, arg: AB) -> 'C':
        self.arg = arg  # type: AB  # (redundantly)

The dependencies will be non-circular:

  • a depends on nothing
  • c depends on a
  • b depends on a and c

PS. Having written all this down I also see that using a protocol would require fewer changes. One could keep the original b.py, and modify c.py to define a protocol class:

class PB(Protocol):
    def meth(self) -> 'C': pass
class C:
    def __init__(self, arg: PB) -> None:
        self.arg = arg  # type: PB  # (redundantly)

But, again, this introduces duck typing where probably none was used before the refactoring. (In my experience duck typing is primarily used for builtin protocols like Iterable or Mapping, not for user-defined classes.)

@gvanrossum
Copy link
Member Author

I also have some comments related to the difference between how mypy interprets isinstance() and how it is treated at runtime. You linked to python/mypy#3827 for context, and gave this example:

@runtime_checkable
class ProtoA(Protocol):
    attr: int
class ProtoB(Protocol):
    attr: str

x: Union[ProtoA, ProtoB]
if isinstance(x, ProtoA):
    x.attr + 1  # the binder type is still a union here, so mypy will complain after the issue is fixed.

Well, maybe I don't have a comment other than thinking that this is giving me more concerns about @runtime[_checkable] for protocols with data members. Isn't this just going to be an attractive nuisance? What exactly are the use cases for @runtime_checkable? Perhaps they only apply to stdlib protocols that are entirely made up from dunder methods, like Sequence?

@ilevkivskyi
Copy link
Member

Sorry for a long response. The most important question is actually the last one, but previous points can give some more context.


so apart from the circular import problem, nominal type checking is fine

Although I agree it is fine, I still have some feeling (can't call it a motivated opinion) that interfaces between independent "modules" (like build targets in case of Bazel) should be as much agnostic about the inner parts of these "modules" as possible. Protocols (or at least ABCs) are better in this sense than classical nominal types.

Yeah, the main reason to use a protocol in this case would be that it's an easy way to spell a bunch of hasattr() checks. But I don't understand what you mean by "destructuring a union of protocols" -- is there some reason I'm not seeing that <some check here> cannot be isinstance(item, LegacyData)?

"Destructuring" is just a fancy term for selecting an single type from a union. I think isinstance() is probably the best way to express this (definitely better than a bunch of hasattr() checks). My only worry was that isinstance() requires the metaclass. But as I understand, you don't worry about this and there is a possible way to avoid the metaclass using PEP 560, so I think there is no reason it cannot be isinstance().

Now let me explain why I think ABCs would be good enough to deal with the typical circular import case.
[...]
PS. Having written all this down I also see that using a protocol would require fewer changes

This is actually one of the motivation points of the PEP 544. Protocols don't require adding explicit base classes or adding new modules. Otherwise they are pretty similar to ABCs. ABCs may feel safer, but I think it is just a feeling; mypy can verify pretty much everything statically with protocols that ABCs can do at runtime.

But, again, this introduces duck typing where probably none was used before the refactoring. (In my experience duck typing is primarily used for builtin protocols like Iterable or Mapping, not for user-defined classes.)

Maybe this is partially because there were no way of creating simple user defined protocols? My general feeling is that protocols are good for describing argument types of some functions and methods, for return types, and (module, class, and instance) variables nominal types are definitely better. We have seen quite a lot of requests to add more SupportsX things to typing which means that users want to have simple small protocols. Also PEP 544 gives a good example:

def close_all(things: Iterable[SupportsClose]) -> None:
    for thing in things:
        thing.close()

Instead of a large union of various resource types, one can have one small protocol.

Isn't this just going to be an attractive nuisance? What exactly are the use cases for @runtime_checkable? Perhaps they only apply to stdlib protocols that are entirely made up from dunder methods, like Sequence?

I think this is the most important question in this discussion. Previous points were more about "are protocols useful?", I hope we both agree they are. I would say if an ABC has only few members, then a protocol is almost always a better solution. (This is definitely more questionable for larger ABCs, it is easier to explicitly subclass Sequence and implement just two methods, than implement all of them if using structural subtyping).

So the question is "Do we need to allow isinstance() for protocols with non-method members?"
I think yes. And the main use case we already have seen above isinstance(item, LegacyData) instead of a bunch of hasattr() checks. Apart from being simpler to write and easier to read, isinstance() can be type checked by mypy.

The current situation is that we disallow issubclass() for non-method protocols because we can't treat them safely, but we still allow isinstance() (which under the hood translates to essentially a bunch of hasattr() calls at runtime). This is a compromise proposed by Jukka in September when Protocol was first added to typing_extensions. I like it, but was worried that it would require a metaclass, now that it turned out it is not the case (we can just use more PEP 560), I believe this compromise is the right solution.

@gvanrossum
Copy link
Member Author

gvanrossum commented Apr 6, 2018 via email

@jhance
Copy link

jhance commented Apr 6, 2018

When I said that, I was merely showing my bias for protocols over abcs, I think. The same principle exists for both. What we need to untangle bazel targets is using these things to do dependency injection.

I would say that if you have to use the structural nature of the protocol, in order to get dependencies wrong, then probably you set up your bazel targets wrong. I've been putting the protocols in separate bazel libs because I value the ability to explicitly state that I'm implementing a protocol.

@ilevkivskyi
Copy link
Member

@gvanrossum

Quick: usually LegacyData (in the mind of the developer) is a specific
class, not a duck type, so it can use a nominal isinstance();

TBH I don't understand this. There seem to be two independent statements mixed together. Independently of what is LegacyData in the mind of developer, it is a protocol at runtime, and will act the way __instancecheck__ tells it. Also structural check can never invalidate subclassing: if C is an actual subclass of P it is always be a structural subtype of it. Maybe you can show an example of a potential problem?

and I worry very much about the difference between isinstance() at runtime and at
type-check time.

This part is even less clear to me. Yes, there is a bug in mypy now that it treats isinstance() not exactly the same as runtime check, but the bug is still there only because I am lazy, not because of any limitations or problems in the specification. What would be hard in implementing a simple type erasure? Also please note that Iterable, Sized, and many other things in typing are protocols for almost half a year, and we didn't hear a single complain about the fact that mypy doesn't do type erasure in isinstance() like at runtime. Anyway, I just raised priority to high.

@JukkaL Maybe you can join the discussion? We need to finish this infinite discussion that already happened twice, and now happening the third time. A short reminder about current state (just in case):

  • We only allow isinstance() and issubclass() checks for explicitly marked @runtime protocols.
  • In addition, issubclass() is only allowed with method protocols.
  • Mypy should do a type erasure for all attributes when performing these checks statically.

Does this still sound reasonable to you? IIUC Guido doesn't like something of this, but I can't yet understand what exactly.

@jhance My general view of this is approximately like this:

  • Interfaces within the same target can (and probably should) use nominal types.
  • Interfaces between targets can (and probably should) use protocols with members expressed via shared/library classes.
  • Interfaces between independent projects can (and probably should) use TypedDicts with values expressed in builtin types.

@gvanrossum
Copy link
Member Author

Independently of what is LegacyData in the mind of developer, it is a protocol at runtime

That's how you wrote it. But if this was a real app it would most likely have two actual classes representing the legacy and modern data formats and the code would be fine if it used nominal isinstance() calls, and there would be no need for protocols.

there is a bug in mypy now that it treats isinstance() not exactly the same as runtime check

But doesn't fixing this bug lead to other problems? Continuing your example, it means that after isinstance(data, LegacyData) the inferred type of data.x would have to be Any. But this silences subsequent type checks involving data.x. The alternative, inferring it as object, would be similarly disappointing, as it would require further isinstance() checks. The user likely expects the behavior to match that of nominal type checks where data.x can reliably be inferred as int.

IIUC the problem becomes even more pronounced when we consider methods -- if erased types become Any then method calls after an isinstance(x, <protocol>) are essentially not type-checked, while if they become object, well, they can't be called at all.

By now I'm wondering if I am totally misunderstanding what you mean by making mypy treat isinstance(x, <protocol>) the same as runtime. And I'm also at a loss what I should propose as a resolution.

Iterable, Sized, and many other things in typing are protocols for almost half a year, and we didn't hear a single complain[t]

Perhaps that's because nobody writes e.g. a __len__ function that returns a string. Basically there's a gray area where runtime and check-time isinstance() disagree. For Iterable etc. the gray area is negligible. But for other cases it may not be, and I don't see how we can get rid of it. Perhaps you are proposing to disallow unions of protocols that overlap after type erasure? Or you propose to only erase types if the names overlap? But that wouldn't do much if the original type isn't a union but e.g. object.

@markshannon
Copy link
Member

I don't see that the particular example requires any support from the typing module.

class LegacyData(Protocol):
    x: int
    y: int
    label: str

class NewData(Protocol):
    coord: Tuple[int, int]
    label: str

def process(items: Iterable[Union[LegacyData, NewData]]) -> None:
    for item in items:
        if <some check here>:
             # unpack and process legacy API
        else:
             # other logic for new API

The simplest version of <some check here> to distinguish LegacyData and NewData is hasattr(item, "x"). It will be faster than any isinstance(item, LegacyData) check and is just as understandable by type checkers.

@ilevkivskyi
Copy link
Member

ilevkivskyi commented Apr 7, 2018

That's how you wrote it. But if this was a real app it would most likely have two actual classes representing the legacy and modern data formats and the code would be fine if it used nominal isinstance() calls, and there would be no need for protocols.

And what is the conclusion from this? That we don't need protocols?

But doesn't fixing this bug lead to other problems? Continuing your example, it means that after isinstance(data, LegacyData) the inferred type of data.x would have to be Any

As it is may be clear from my example you mention #550 (comment) the erasure happens on the supertype (i.e. on the protocol), so that both elements of a union will match statically as they do at runtime. There are however indeed may be other problems if we narrow not starting from a union, but in this case don't doing the erasure is probably fine.

For Iterable etc. the gray area is negligible. But for other cases it may not be, and I don't see how we can get rid of it.

This is quite hypothetical. Also I don't understand why do we need to make @runtime totally safe. The only reason to introduce @runtime[_checkable] was that making isinstance() safe with protocols is hard, so a user would opt-in to this unsafety gaining some convenience if a user needs it.

At this point I am totally fine with just abandoning all this, if it is such a horrible problem that protocols are different statically and at runtime, then let us just not use them.

The simplest version of to distinguish LegacyData and NewData is hasattr(item, "x"). It will be faster than any isinstance(item, LegacyData) check and is just as understandable by type checkers.

OK, great, let us indeed just use hasattr() everywhere instead.

@markshannon
Copy link
Member

I think the conclusion is lets not worry about Unions of Protocols overly much. Any union (implicit or explicit) will need runtime checks anyway. A type checker can use those to narrow the Union[P1, P2] to either P1 or P2 without any additional code in typing.

@gvanrossum
Copy link
Member Author

Sorry, I think I was confused when I tried to argue against the LegacyData use case/example (perhaps because it came as part of a comment that started out proposing Protocols to handle import cycles in the context of Bazel).

The real question is, assuming the need to structure code like that, how should it test for one protocol vs. another. Mark's answer is to use hasattr(). Your answer is isinstance(), but only if @runtime[_checkable] is present, and you have to fix a bug in mypy. PEP 544 also observes problems with isinstance() but proposes to just ignore those when @runtime* is used.

I am still torn about this choice -- whether hasattr() or isinstance() is the better choice here. If we were to choose hasattr() we would presumably not need @runtime*, which makes the PEP shorter (at least the specification). It would seem that isinstance() is the more intuitive choice, but it also seems to have the biggest potential surprises (see the problems noted in the PEP, as linked above).

I also don't understand everything you've said about methods vs. data attributes. Is this purely about the case (the second one from the linked PEP section) of dynamically set attributes?

Is the compromise proposed by Jukka in the PEP? If not, maybe it's time we updated the PEP.

I am sorry this seems to be going around in circles. But we need to come up with a 100% unambiguous specification (which is not the same as 0% gray area -- maybe we just need to state exactly what the gray area is, and then we can debate style recommendations to cope with it).

@ilevkivskyi
Copy link
Member

I also don't understand everything you've said about methods vs. data attributes. Is this purely about the case (the second one from the linked PEP section) of dynamically set attributes?

See last part, I think it will clarify.

Is the compromise proposed by Jukka in the PEP? If not, maybe it's time we updated the PEP.

I had it in my plans for this weekend, but I don't have enough energy to do this.

But we need to come up with a 100% unambiguous specification

I think it is already almost there. Here are some observations followed by the the proposed refined specification.

1. Some problems described in the PEP are consequences of how mypy works, not something intrinsic to protools. For example:

class C:
    def initialize(self) -> None:
        self.x = 0

C().x  # Runtime error uncaught by mypy

The problem with protocol isinstance() is that it lets this unsafety propagate so that actual runtime error can be more confusing. For example:

@runtime
class P(Protocol):
    x: int
class C:
    def initialize(self) -> None:
        self.x = 0

def f(x: Union[P, int]) -> None:
    if isinstance(x, P):
        ...
    else:
        x + 1

f(C())  # TypeError: unsupported operand type(s) for +: 'C' and 'int'

2. Life would be easier if the structure of a class would be completely specified in the class body, for example dataclasses are perfect in this sense:

@dataclass
class C:
    x: int
    y: int

isinstance(C(0, 0), <some protocol>)  # All instances of C have same set of attributes
                                      # available for runtime inspection on the class,
                                      # removing any ambiguities like above.

Moreover, for normal classes there is no way to correctly work with protocol issubclass() in this example:

@runtime
class Coordinates(Protocol):
    x: int
    y: int

class Point:
    def __init__(self, x: int, y: int) -> None:
        self.x = x
        self.y = y

issubclass(Point, Coordinates)  # Point implements Coordinates, but we have no way
                                # to know about this at runtime.

Again, there would be no such problem with dataclasses.

3. It is hard to correctly narrow down type by a protocol isinstance() if we start from a non-union type. For example:

@runtime
class P(Protocol):
    x: int
class C:
    pass
class D(C):
    x: int
x: C
if isinstance(x, P):
    # mypy infers this branch is unreachable, but it can be D

This is actually not something intrinsic to protocols. There are several mypy issues like python/mypy#3603 related to the fact that don't have intersection types. This is a real grey area, the PEP currently provides some example about this, but I propose to remove this and say something like "Type checkers can use their best judgement when narrowing down from a non-union type". We can also explain that a precise specification of this part would require intersection types.

4. After some thinking, it seems to me that an error is better than type erasure if there is an unsafe overlap in isinstance() check. For example:

@runtime
class P(Protocol):
    x: int
class A:
    x: str
x: A
isinstance(x, P)  # fails type check, unsafe overlap detected.

Type checkers would detect this by comparing the results of erased and non-erased subtype checks, if they are different, there is an unsafe overlap.

5. I think narrowing down from union types is actually easier to specify a bit more precise (since it is a relatively important use case). The specification should mention that a type checker should be at least able to select the correct term from a union, and warn if there is an unsafe overlap ("at least" is because any further possible narrowing hits the problem 3 above). For example:

@runtime
class P(Protocol):
    x: int

class A:
    x: int
class B:
    x: str
class C:
    other: int

x: Union[A, B]
y: Union[A, C]
isinstance(x, P)  # fails type check, B unsafely overlaps with P
if isinstance(y, P):
    # type of y is A here
else:
    # type of y is C here

OK, so taking into account these observations I propose the following specification:
Definitions:

  • Data, and non-data protocols: A protocol is called non-data protocol if it only contains methods as members (for example Sized, Iterator etc). A protocol that contains at least one non-method member (like x: int) is called a data protocol.
  • Unsafe overlap: A type X is called unsafely overlapping with a protocol P, if X is not a subtype P, but it is a subtype of the type erased version of P where all members have type Any. In addition, if at least one element of a union unsafely overlaps with a protocol P, then the whole union is unsafely overlapping with P.

Specification:

  • Protocols can be used as a second argument in isinstance() and issubclass() only if they explicitly opt-in by @runtime decorator.

  • isinstance() can be used with both data and non-data protocols, while issubclass() can be used only with non-data protocols. This restriction exists because some data attributes can be set on an instance in constructor and this information is not always available on the class object.

  • Type checkers should reject an isinstance() or issubclass() call, if there is an unsafe overlap between a known type and the proposed protocol.

  • Type checkers should be able to select a correct element from a union after a safe isinstance() or issubclass() call. For narrowing from non-union types, type checkers can use their best judgement (this is intentionally unspecified, since a precise specification would require intersection types).

  • But why not just hasattr()? Because hasattr(x, '__iter__') and hasattr(x, '__next__') is ugly. Also it is still not safe. Namely it suffers from exactly the same example as above:

    class P(Protocol):
        x: int
    class C:
        def initialize(self) -> None:
            self.x = 0
    
    def f(arg: Union[P, int]) -> None:
        if hasattr(arg, 'x'):
            ...
        else:
            arg + 1
    
    f(C())  # TypeError: unsupported operand type(s) for +: 'C' and 'int'

@ilevkivskyi
Copy link
Member

If we were to choose hasattr() we would presumably not need @runtime*, which makes the PEP shorter (at least the specification).

Hm, but the proposed specification is exactly on par with hasattr() in terms of unsafety/surprise. The two remaining grey areas - dynamically set attributes and intersection types - are equally problematic for both.

Does this mean you are OK with dropping @runtime with refined specification?

@ilevkivskyi
Copy link
Member

ilevkivskyi commented Apr 8, 2018

@gvanrossum Just to clarify, the specification draft above is almost exactly the compromise proposed by Jukka I mentioned earlier. The only difference is that I propose a safer fix for python/mypy#3827 than I proposed before.

@gvanrossum
Copy link
Member Author

Thanks for writing this up again for me!

Does this mean you are OK with dropping @runtime with refined specification?

Yes, if a type checker can always warn when a runtime isinstance() or subclass() check may hit a gray area.

The rest of the proposal sounds good, except I worry about this phrase:

  • Type checkers should reject an isinstance() or issubclass() call, if there is an unsafe overlap between a known type and the proposed protocol.

This makes me worry -- does "a known type" refer to "any type anywhere in the program being checked"? Even if it was only referring to a known possible type for the first argument that would still worry me, because the class D from the example in (3) might be anywhere in the program. So what did you intend here, and how do you propose type checkers implement this requirement?

Finally I notice that there may be a way to reduce the gray area in cases where the type checker can determine that all instances of a class C have a certain attribute x -- e.g. when C is a dataclass or in other situations where C.x is always set (e.g. when it is a descriptor).

@ilevkivskyi
Copy link
Member

Yes, if a type checker can always warn when a runtime isinstance() or subclass() check may hit a gray area.

Do you mean the two existing gray areas I outlined above (dynamic attributes and lack of intersection types)? If yes, then the key word here is "may", emitting a type check error on every (even a tiny one) unsafety would be way more annoying for a user than sprinkling the code with few @runtime decorators thus acknowledging the existence of gray areas. Btw I think we should tweak the corresponding error message in mypy to make this more explicit, for example:

__main__.py: 42: error: Using isinstance() is not safe with protocol types, see <relevant link>.
__main__.py: 42: note: To allow this, decorate the definition of "Proto" with @runtime_checkable

This makes me worry -- does "a known type" refer to "any type anywhere in the program being checked"? So what did you intend here, and how do you propose type checkers implement this requirement?

I mean the declared or inferred type of the first argument of isinstance(). This will be implemented directly by definition of unsafe overlap above in specification. For example:

@runtime
class P(Protocol):
    x: int
class C:
    x: str

isinstance(C(), P)  # error because subtypes.is_subtype(C, P) returns False,
                    # while subtypes.is_subtype(C, <erased P>) returns True.

Even if it was only referring to a known possible type for the first argument that would still worry me, because the class D from the example in (3) might be anywhere in the program.

IIUC, this is a part of one of the gray areas I mentioned -- lack of intersection types (although this is probably not the best term to correctly describe this area, see below). Ideally, for this code:

@runtime
class P(Protocol):
    x: int
class C:
    pass
class D:
    x: int

x: C
if isinstance(x, P):
    # x is <narrowed type> here

the <narrowed type> should be something like make_fake_intersection(C, P) in terms of Jukka's proposal python/mypy#3603 (comment). This is however still not perfect because type of x in D may be wrong. To make it safe, we would need to track all defined subclasses everywhere in code, which is a very bad idea w.r.t performance. In other words, the root problem is not impossibility to express intersections, but the fact that type checkers can only prove that a given class is safe with protocol isinstance(), but not all its subclasses.

Finally I notice that there may be a way to reduce the gray area in cases where the type checker can determine that all instances of a class C have a certain attribute x -- e.g. when C is a dataclass or in other situations where C.x is always set (e.g. when it is a descriptor).

Yes, and we have an issue to do this even independently of protocols, and there is another Jukka's proposal python/mypy#4019 (comment)

About the last two comments, I don't want to add any specifications to the PEP about what type checkers should do to minimise/mitigate these two gray areas. There are three reasons for this:

  • The two mentioned proposals are relatively big projects.
  • These projects are already valuable independently of protocols.
  • Even with both proposals implemented, there is no way to completely remove these gray areas for protocol isinstance().

So the renewed specification proposal is:
Definitions:
...unchanged...

Specification:

  • Protocols can be used as a second argument in isinstance() and issubclass() only if they explicitly opt-in by @runtime decorator. This requirement exists because protocol checks are not type safe in case of dynamically set attributes, and because type checkers can only prove that an isinstance() check is safe only for a given class, not for all its subclasses.
  • isinstance() can be used with both data and non-data protocols, while issubclass() can be used only with non-data protocols. This restriction exists because some data attributes can be set on an instance in constructor and this information is not always available on the class object.
  • Type checkers should reject an isinstance() or issubclass() call, if there is an unsafe overlap between a declared or inferred type of the variables used as a first argument and the proposed protocol.
  • Type checkers should be able to select a correct element from a union after a safe isinstance() or issubclass() call. For narrowing from non-union types, type checkers can use their best judgement (this is intentionally unspecified, since a precise specification would require intersection types).

I hope we agree that there are two relatively well defined gray areas where isinstance() can't be safe with protocols. This seems to match your requirement:

But we need to come up with a 100% unambiguous specification (which is not the same as 0% gray area -- maybe we just need to state exactly what the gray area is, and then we can debate style recommendations to cope with it).

Re style recommendations to mitigate the problems I see three recommendations:

  • Don't use dynamically set attributes wherever protocols are used with isinstance(), ideally use dataclasses in such situations.

  • Whenever an isinstance() with protocol is used, declare the prior type as precisely as possible, especially prefer a union over a common superclass:

    class A:
        pass
    class B(A):
        x: int
    class C(A):
        x: str
    
    a: A  # Bad, type checkers may not find unsafety.
    if isinstance(a, <some protocol>): ...
    
    aa: Union[B, C]  # Good, type checkers will find the problem.
    if isinstance(a, <some protocol>): ...
  • Only use isinstance() with unions of protocols, when an "occasional" overlap between union elements is unlikely, and the protocols are intended to be mutually exclusive:

    @runtime
    class PA(Protocol):
        x: int
    class PB(Protocol):
        x: str
    
    def f(arg: Union[PA, PB]) -> None:
        if isinstance(arg, PA):  # Very bad
            ...
    @runtime
    class LegacyData(Protocol):
        x: int
        y: int
    class NewData(Protocol):
        coord: Tuple[int, int]
    
    def f(arg: Union[LegacyData, NewData]) -> None:
        if isinstance(arg, LegacyData):  # OK
            coord = arg.x, arg.y

@gvanrossum
Copy link
Member Author

OK, I think we're on the same page now. I was confused by the situation around isinstance(x, P) -- I had somehow imagined that x would be an instance of some specific class C, but of course that's not very useful -- it's more likely either a union of protocols (one of which is P) or a superclass like object (or Any). If it's a union of protocols all is well (since we assume that the call site is checked so it conforms to at least one union member). If it's a superclass we're in the gray area, since all its subclasses will be accepted statically, but some subclasses may define a discriminating attribute while others may not.

Also, "dynamic" (I would say "late") initialization is an issue -- and again this is not possible to guard against statically.

So I agree that (alas) we need @runtime[_checkable].

@user-name-name
Copy link

Hello, any update on this issue?

@ilevkivskyi
Copy link
Member

This is done now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants