Skip to content

Repair is_dataclass definition? #9723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
NeilGirdhar opened this issue Feb 12, 2023 · 15 comments · Fixed by #9758 or #11929
Closed

Repair is_dataclass definition? #9723

NeilGirdhar opened this issue Feb 12, 2023 · 15 comments · Fixed by #9758 or #11929

Comments

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Feb 12, 2023

Consider:

from dataclasses import is_dataclass

def f(X: type) -> None:
    if is_dataclass(X):
        reveal_type(X)  # information: Type of "X" is "type"

Ideally, X would have type type[DataclassInstance]. It doesn't because is_dataclass is defined:

@overload
def is_dataclass(obj: DataclassInstance | type[DataclassInstance]) -> Literal[True]: ...
@overload
def is_dataclass(obj: type) -> TypeGuard[type[DataclassInstance]]: ...
@overload
def is_dataclass(obj: object) -> TypeGuard[DataclassInstance | type[DataclassInstance]]: ...

And the argument of type type matches the first overload, which does not incorporate a TypeGuard. Eric Traut recommends erasing the first overload of is_dataclass.

@srittau
Copy link
Collaborator

srittau commented Feb 13, 2023

Sounds logical to me. Cc @AlexWaygood who submitted the current version in #9362.

@AlexWaygood
Copy link
Member

AlexWaygood commented Feb 14, 2023

Yup, my bad — it makes sense to get rid of this overload. Looks like mypy has the same issue as pyright here, if you use the mypy master branch.

I don't have access to a computer at the moment (and don't really want to make a PR on my iPad), but feel free to submit a PR. You'll need to update some test cases in https://github.com/python/typeshed/blob/main/test_cases/stdlib/check_dataclasses.py (and you get extra brownie points if you add some new ones to that file as well ;)

@sterliakov
Copy link

If the first overload is discarded, then the other problem arises. Here's the failing test piece:

@dc.dataclass
class Foo:
    attr: str

if dc.is_dataclass(Foo):
    # The inferred type doesn't change
    # if it's already known to be a subtype of type[_DataclassInstance]
    assert_type(Foo, Type[Foo])

It's failing, because Foo is not a type[Foo] any more, but type[DataclassInstance] instead. TypeGuard is upcasting its argument (in mypy and in pyright - PEP doesn't seem to mention this case explicitly), and the first overload was probably introduced exactly for this reason.

@AlexWaygood
Copy link
Member

and the first overload was probably introduced exactly for this reason.

Yes, I can confirm that this is the reason why I introduced this overload when I rewrote the stub for this function in #9362. But I think the problems pointed out in this issue outweigh the benefits from avoiding the upcast here.

Rather than discarding the first overload entirely, it might be better to change it like so:

 @overload
-def is_dataclass(obj: DataclassInstance | type[DataclassInstance]) -> Literal[True]: ...
+def is_dataclass(obj: DataclassInstance) -> Literal[True]: ...

That way we still avoid the undesirable upcast for dataclass instances, but avoid the problems with type[T] covariance that have been pointed out in this issue.

@sterliakov
Copy link

sterliakov commented Feb 18, 2023

TBH, I'd rather leave this as-is: the problem above is not a covariance problem, it's about omitted generics/abusing Any. Replace type with type[object], and things will work as expected:

from dataclasses import is_dataclass

def f(X: type[object]) -> None:
    if is_dataclass(X):
        reveal_type(X)  # N: Revealed type is "Type[dataclasses._DataclassInstance]"

@AlexWaygood
Copy link
Member

it's about omitted generics/abusing Any

Agreed, but I know that mypy doesn't flag a bare type in an annotation, even if you're using --disallow-any-generics, and I think pyright does similar. That means that even users who are taking pains to avoid unsafe Anys entering their annotations might encounter this behavior unexpectedly. So I'm still inclined to change the first overload.

@sterliakov
Copy link

Hm, now I'm curious, why mypy doesn't flag bare type, but still fills it with type[Any] instead of type[object] (the latter would be not PEP-conformant, but at least coherent with not considering bare type an error, as opposed to all other generics with omitted parameters). Also, this behaviour is reported as bug - perhaps it should be fixed instead of adjusting typeshed to match?

@NeilGirdhar
Copy link
Contributor Author

TBH, I'd rather leave this as-is: the problem above is not a covariance problem, it's about omitted generics/abusing Any.

I don't know that that's the problem. I think the problem is that the type guard is upcasting. Ideally, the type guard—just like any instance check—would intersect (python/typing#213). So there appears to be a bug in the type guard mechanism. I filed a bug to this effct: python/typing#1351.

If the intersection were working, then you wouldn't need the first overload, right?

@AlexWaygood
Copy link
Member

If the intersection were working, then you wouldn't need the first overload, right?

If that's how TypeGuards worked in general, then yes, the first overload would be unnecessary. I don't have an opinion on whether that's a good idea in terms of the type theory, but it's not how the two most popular type checkers currently implement the feature.

@AlexWaygood
Copy link
Member

AlexWaygood commented Feb 18, 2023

Also, this behaviour is reported as bug - perhaps it should be fixed instead of adjusting typeshed to match?

As Eric points out in the pyright issue you opened, it's sort of debatable whether that's actually a bug. type in an annotation can mean "type[Any]", but it can also just mean "an instance of the class type". (When I wrote the overload we're discussing in this issue, I intended the second meaning, but it seems both mypy and pyright interpret it according to the first meaning.) There's a lot of ambiguity and unsafety in general with using type[T] in annotations tbh, and it's all slightly unfortunate.

@NeilGirdhar
Copy link
Contributor Author

NeilGirdhar commented Feb 18, 2023

One workaround that would work today would be to add DataclassInstance.__subclasshook__. As you can see, it has the desired behaviour:

from dataclasses import dataclass, is_dataclass
from abc import ABC
from typing import Any, TypeGuard, overload
from typing_extensions import reveal_type


class DataclassInstance:
    pass

@overload
def fixed_is_dataclass(obj: type) -> TypeGuard[type[DataclassInstance]]: ...

@overload
def fixed_is_dataclass(obj: object) -> TypeGuard[DataclassInstance | type[DataclassInstance]]: ...

def fixed_is_dataclass(obj: Any) -> TypeGuard[DataclassInstance | type[DataclassInstance]]:
    return is_dataclass(obj)

class AwesomeDataclassInstance(ABC):
    def __init_subclass__(cls) -> None:
        super().__init_subclass__()
        dataclass(cls)

    @classmethod
    def __subclasshook__(cls, sub: Any) -> bool:
        return isinstance(sub, type) and is_dataclass(sub) or super().__subclasshook__(sub)

@dataclass
class Y:
    pass

def f(x: type[Y]) -> None:
    if fixed_is_dataclass(x):
        print("fixed_is_dataclass")
        reveal_type(x)  # Just type[DataclassInstance]
    if issubclass(x, AwesomeDataclassInstance):
        print("AwesomeDataclassInstance")
        reveal_type(x)  # type[Y] and type[AwesomeDataclassInstance])

f(Y)

But that doesn't repair is_dataclass, and I imagine that the ABC would have to be added to the standard library.

@sterliakov
Copy link

sterliakov commented Feb 19, 2023

but it can also just mean "an instance of the class type"

Hm, that sounds unclear to me. If you spell "an instance of class list" in the same fashion, type checkers say: "go away and think, are you sure you don't want generic parameters?" What makes type so different? Is it about specific internal treatment of type (missing __class_getitem__ and consequently some magic, AFAIC)? Is this reason really important for typecheckers, given that they anyway treat type annotation as generic alias? (Sorry for polluting closed unrelated issue, I'm not sure whether I should open new issue about this on python/typing or smth else - I never noticed this weird type handling, and now I'd really like it changed)? Also this makes a clear distinction between type and Type in annotations, which goes against PEP585 and Type deprecation.

@AlexWaygood
Copy link
Member

but it can also just mean "an instance of the class type"

Hm, that sounds unclear to me. If you spell "an instance of class list" in the same fashion, type checkers say: "go away and think, are you sure you don't want generic parameters?" What makes type so different? Is it about specific internal treatment of type (missing __class_getitem__ and consequently some magic, AFAIC)? Is this reason really important for typecheckers, given that they anyway treat type annotation as generic alias? (Sorry for polluting closed unrelated issue, I'm not sure whether I should open new issue about this on python/typing or smth else - I never noticed this weird type handling, and now I'd really like it changed)? Also this makes a clear distinction between type and Type in annotations, which goes against PEP585 and Type deprecation.

I don't really have any strong opinion or special insight here — opening an issue on python/typing sounds like a good option if you want to discuss this further :)

@NeilGirdhar
Copy link
Contributor Author

Should is_dataclass use TypeIs?

@JelleZijlstra
Copy link
Member

I think so, PR welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants