-
Notifications
You must be signed in to change notification settings - Fork 258
Moving Protocols from typing_extensions into typing #550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm not sure that a Protocol is needed for this case. In most cases (assuming these are actual classes or named tuples, not dicts) there are specific class definitions that are used everywhere (though not named everywhere explicitly), so apart from the circular import problem, nominal type checking is fine . So the refactoring needed could just as well introduce an ABC. I'll elaborate below.
Yeah, the main reason to use a protocol in this case would be that it's an easy way to spell a bunch of Now let me explain why I think ABCs would be good enough to deal with the typical circular import case. (Though see farther down for a concern.) Suppose we have two unannotated modules like this:
from c import C
class B:
def meth(self):
return C(self)
class C:
def __init__(self, arg):
self.arg = arg # This is always a B instance and let's suppose these are in two different Bazel packages,
When we start adding annotations, we wish to add
class AC:
pass
class AB:
def meth(self) -> AC:
raise NotImplementedError We then import
from a import AB
from c import C
class B(AB):
def meth(self) -> C:
return C()
from a import AB, AC
class C(AC):
def __init__(self, arg: AB) -> 'C':
self.arg = arg # type: AB # (redundantly) The dependencies will be non-circular:
PS. Having written all this down I also see that using a protocol would require fewer changes. One could keep the original class PB(Protocol):
def meth(self) -> 'C': pass
class C:
def __init__(self, arg: PB) -> None:
self.arg = arg # type: PB # (redundantly) But, again, this introduces duck typing where probably none was used before the refactoring. (In my experience duck typing is primarily used for builtin protocols like |
I also have some comments related to the difference between how mypy interprets @runtime_checkable
class ProtoA(Protocol):
attr: int
class ProtoB(Protocol):
attr: str
x: Union[ProtoA, ProtoB]
if isinstance(x, ProtoA):
x.attr + 1 # the binder type is still a union here, so mypy will complain after the issue is fixed. Well, maybe I don't have a comment other than thinking that this is giving me more concerns about |
Sorry for a long response. The most important question is actually the last one, but previous points can give some more context.
Although I agree it is fine, I still have some feeling (can't call it a motivated opinion) that interfaces between independent "modules" (like build targets in case of Bazel) should be as much agnostic about the inner parts of these "modules" as possible. Protocols (or at least ABCs) are better in this sense than classical nominal types.
"Destructuring" is just a fancy term for selecting an single type from a union. I think
This is actually one of the motivation points of the PEP 544. Protocols don't require adding explicit base classes or adding new modules. Otherwise they are pretty similar to ABCs. ABCs may feel safer, but I think it is just a feeling; mypy can verify pretty much everything statically with protocols that ABCs can do at runtime.
Maybe this is partially because there were no way of creating simple user defined protocols? My general feeling is that protocols are good for describing argument types of some functions and methods, for return types, and (module, class, and instance) variables nominal types are definitely better. We have seen quite a lot of requests to add more def close_all(things: Iterable[SupportsClose]) -> None:
for thing in things:
thing.close() Instead of a large union of various resource types, one can have one small protocol.
I think this is the most important question in this discussion. Previous points were more about "are protocols useful?", I hope we both agree they are. I would say if an ABC has only few members, then a protocol is almost always a better solution. (This is definitely more questionable for larger ABCs, it is easier to explicitly subclass So the question is "Do we need to allow The current situation is that we disallow |
Quick: usually LegacyData (in the mind of the developer) is a specific
class, not a duck type, so it can use a nominal isinstance(); and I worry
very much about the difference between isinstance() at runtime and at
type-check time.
|
When I said that, I was merely showing my bias for protocols over abcs, I think. The same principle exists for both. What we need to untangle bazel targets is using these things to do dependency injection. I would say that if you have to use the structural nature of the protocol, in order to get dependencies wrong, then probably you set up your bazel targets wrong. I've been putting the protocols in separate bazel libs because I value the ability to explicitly state that I'm implementing a protocol. |
TBH I don't understand this. There seem to be two independent statements mixed together. Independently of what is
This part is even less clear to me. Yes, there is a bug in mypy now that it treats @JukkaL Maybe you can join the discussion? We need to finish this infinite discussion that already happened twice, and now happening the third time. A short reminder about current state (just in case):
Does this still sound reasonable to you? IIUC Guido doesn't like something of this, but I can't yet understand what exactly. @jhance My general view of this is approximately like this:
|
That's how you wrote it. But if this was a real app it would most likely have two actual classes representing the legacy and modern data formats and the code would be fine if it used nominal
But doesn't fixing this bug lead to other problems? Continuing your example, it means that after IIUC the problem becomes even more pronounced when we consider methods -- if erased types become By now I'm wondering if I am totally misunderstanding what you mean by making mypy treat
Perhaps that's because nobody writes e.g. a |
I don't see that the particular example requires any support from the class LegacyData(Protocol):
x: int
y: int
label: str
class NewData(Protocol):
coord: Tuple[int, int]
label: str
def process(items: Iterable[Union[LegacyData, NewData]]) -> None:
for item in items:
if <some check here>:
# unpack and process legacy API
else:
# other logic for new API The simplest version of |
And what is the conclusion from this? That we don't need protocols?
As it is may be clear from my example you mention #550 (comment) the erasure happens on the supertype (i.e. on the protocol), so that both elements of a union will match statically as they do at runtime. There are however indeed may be other problems if we narrow not starting from a union, but in this case don't doing the erasure is probably fine.
This is quite hypothetical. Also I don't understand why do we need to make At this point I am totally fine with just abandoning all this, if it is such a horrible problem that protocols are different statically and at runtime, then let us just not use them.
OK, great, let us indeed just use |
I think the conclusion is lets not worry about Unions of Protocols overly much. Any union (implicit or explicit) will need runtime checks anyway. A type checker can use those to narrow the |
Sorry, I think I was confused when I tried to argue against the The real question is, assuming the need to structure code like that, how should it test for one protocol vs. another. Mark's answer is to use I am still torn about this choice -- whether I also don't understand everything you've said about methods vs. data attributes. Is this purely about the case (the second one from the linked PEP section) of dynamically set attributes? Is the compromise proposed by Jukka in the PEP? If not, maybe it's time we updated the PEP. I am sorry this seems to be going around in circles. But we need to come up with a 100% unambiguous specification (which is not the same as 0% gray area -- maybe we just need to state exactly what the gray area is, and then we can debate style recommendations to cope with it). |
See last part, I think it will clarify.
I had it in my plans for this weekend, but I don't have enough energy to do this.
I think it is already almost there. Here are some observations followed by the the proposed refined specification. 1. Some problems described in the PEP are consequences of how mypy works, not something intrinsic to protools. For example: class C:
def initialize(self) -> None:
self.x = 0
C().x # Runtime error uncaught by mypy The problem with protocol @runtime
class P(Protocol):
x: int
class C:
def initialize(self) -> None:
self.x = 0
def f(x: Union[P, int]) -> None:
if isinstance(x, P):
...
else:
x + 1
f(C()) # TypeError: unsupported operand type(s) for +: 'C' and 'int' 2. Life would be easier if the structure of a class would be completely specified in the class body, for example dataclasses are perfect in this sense: @dataclass
class C:
x: int
y: int
isinstance(C(0, 0), <some protocol>) # All instances of C have same set of attributes
# available for runtime inspection on the class,
# removing any ambiguities like above. Moreover, for normal classes there is no way to correctly work with protocol @runtime
class Coordinates(Protocol):
x: int
y: int
class Point:
def __init__(self, x: int, y: int) -> None:
self.x = x
self.y = y
issubclass(Point, Coordinates) # Point implements Coordinates, but we have no way
# to know about this at runtime. Again, there would be no such problem with dataclasses. 3. It is hard to correctly narrow down type by a protocol @runtime
class P(Protocol):
x: int
class C:
pass
class D(C):
x: int
x: C
if isinstance(x, P):
# mypy infers this branch is unreachable, but it can be D This is actually not something intrinsic to protocols. There are several mypy issues like python/mypy#3603 related to the fact that don't have intersection types. This is a real grey area, the PEP currently provides some example about this, but I propose to remove this and say something like "Type checkers can use their best judgement when narrowing down from a non-union type". We can also explain that a precise specification of this part would require intersection types. 4. After some thinking, it seems to me that an error is better than type erasure if there is an unsafe overlap in @runtime
class P(Protocol):
x: int
class A:
x: str
x: A
isinstance(x, P) # fails type check, unsafe overlap detected. Type checkers would detect this by comparing the results of erased and non-erased subtype checks, if they are different, there is an unsafe overlap. 5. I think narrowing down from union types is actually easier to specify a bit more precise (since it is a relatively important use case). The specification should mention that a type checker should be at least able to select the correct term from a union, and warn if there is an unsafe overlap ("at least" is because any further possible narrowing hits the problem 3 above). For example: @runtime
class P(Protocol):
x: int
class A:
x: int
class B:
x: str
class C:
other: int
x: Union[A, B]
y: Union[A, C]
isinstance(x, P) # fails type check, B unsafely overlaps with P
if isinstance(y, P):
# type of y is A here
else:
# type of y is C here OK, so taking into account these observations I propose the following specification:
Specification:
|
Hm, but the proposed specification is exactly on par with Does this mean you are OK with dropping |
@gvanrossum Just to clarify, the specification draft above is almost exactly the compromise proposed by Jukka I mentioned earlier. The only difference is that I propose a safer fix for python/mypy#3827 than I proposed before. |
Thanks for writing this up again for me!
Yes, if a type checker can always warn when a runtime The rest of the proposal sounds good, except I worry about this phrase:
This makes me worry -- does "a known type" refer to "any type anywhere in the program being checked"? Even if it was only referring to a known possible type for the first argument that would still worry me, because the class Finally I notice that there may be a way to reduce the gray area in cases where the type checker can determine that all instances of a class |
Do you mean the two existing gray areas I outlined above (dynamic attributes and lack of intersection types)? If yes, then the key word here is "may", emitting a type check error on every (even a tiny one) unsafety would be way more annoying for a user than sprinkling the code with few
I mean the declared or inferred type of the first argument of @runtime
class P(Protocol):
x: int
class C:
x: str
isinstance(C(), P) # error because subtypes.is_subtype(C, P) returns False,
# while subtypes.is_subtype(C, <erased P>) returns True.
IIUC, this is a part of one of the gray areas I mentioned -- lack of intersection types (although this is probably not the best term to correctly describe this area, see below). Ideally, for this code: @runtime
class P(Protocol):
x: int
class C:
pass
class D:
x: int
x: C
if isinstance(x, P):
# x is <narrowed type> here the
Yes, and we have an issue to do this even independently of protocols, and there is another Jukka's proposal python/mypy#4019 (comment) About the last two comments, I don't want to add any specifications to the PEP about what type checkers should do to minimise/mitigate these two gray areas. There are three reasons for this:
So the renewed specification proposal is: Specification:
I hope we agree that there are two relatively well defined gray areas where
Re style recommendations to mitigate the problems I see three recommendations:
|
OK, I think we're on the same page now. I was confused by the situation around Also, "dynamic" (I would say "late") initialization is an issue -- and again this is not possible to guard against statically. So I agree that (alas) we need |
Hello, any update on this issue? |
This is done now. |
This is a placeholder issue so we can have the discussion from #549 outside the context of a specific PR. Relevant comments:
@ilevkivskyi, if you don't mind splitting off the debate, I will write my response to the last bullet below here.
The text was updated successfully, but these errors were encountered: