Skip to content

[ty] Implement DataClassInstance protocol for dataclasses. #18018

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 13, 2025

Conversation

abhijeetbodas2001
Copy link
Contributor

Fixes: astral-sh/ty#92

Summary

We currently get a invalid-argument-type error when using dataclass.fields on a dataclass, because we do not synthesize the __dataclass_fields__ member.

This PR fixes this diagnostic.

Note that we do not yet model the Field type correctly. After that is done, we can assign a more precise tuple[Field, ...] type to this new member.

Test Plan

New mdtest.

Copy link
Contributor

github-actions bot commented May 11, 2025

mypy_primer results

Changes were detected when running on open source projects
psycopg (https://github.com/psycopg/psycopg)
- error[invalid-argument-type] psycopg/psycopg/errors.py:233:21: Argument to function `fields` is incorrect: Expected `DataclassInstance`, found `<class 'FinishedPGconn'>`
- Found 1241 diagnostics
+ Found 1240 diagnostics

hydra-zen (https://github.com/mit-ll-responsible-ai/hydra-zen)
- error[invalid-return-type] src/hydra_zen/wrapper/_implementations.py:945:16: Return type does not match returned value: Expected `DataClass_`, found `@Todo(unsupported type[X] special form) | (((...) -> Any) & dict[Unknown, Unknown]) | (DataClass_ & dict[Unknown, Unknown]) | (list[Any] & dict[Unknown, Unknown]) | dict[Any, Any] | (((...) -> Any) & list[Unknown]) | (DataClass_ & list[Unknown]) | list[Any]`
+ error[invalid-return-type] src/hydra_zen/wrapper/_implementations.py:945:16: Return type does not match returned value: Expected `DataClass_`, found `@Todo(unsupported type[X] special form) | (((...) -> Any) & dict[Unknown, Unknown]) | (DataClass_ & dict[Unknown, Unknown]) | (list[Any] & dict[Unknown, Unknown]) | dict[Any, Any] | (((...) -> Any) & list[Unknown]) | (DataClass_ & list[Unknown]) | list[Any] | (dict[Any, Any] & list[Unknown])`
+ error[type-assertion-failure] tests/annotations/declarations.py:951:5: Actual type `FullBuilds[Unknown] | StdBuilds[Unknown]` is not the same as asserted type `FullBuilds[@Todo(Support for `typing.TypeAlias`)] | StdBuilds[@Todo(Support for `typing.TypeAlias`)]`
- Found 649 diagnostics
+ Found 650 diagnostics

rotki (https://github.com/rotki/rotki)
- error[invalid-argument-type] rotkehlchen/tests/api/test_settings.py:144:39: Argument to function `fields` is incorrect: Expected `DataclassInstance`, found `<class 'DBSettings'>`
- error[invalid-argument-type] rotkehlchen/tests/api/test_users.py:52:39: Argument to function `fields` is incorrect: Expected `DataclassInstance`, found `<class 'DBSettings'>`
- error[invalid-argument-type] rotkehlchen/tests/db/test_db.py:523:57: Argument to function `fields` is incorrect: Expected `DataclassInstance`, found `<class 'DBSettings'>`
- Found 2107 diagnostics
+ Found 2104 diagnostics

@abhijeetbodas2001
Copy link
Contributor Author

This PR does silence the false positives mentioned in the original issue, but does not fully model the return type of dataclasses.fields, because ty does not yet understand the dataclasses.Field class (astral-sh/ty#111).

@dylwil3 dylwil3 added the ty Multi-file analysis & type inference label May 11, 2025
Copy link
Contributor

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much! This looks good.. just a minor question regarding the return type and the type qualifier.

@abhijeetbodas2001
Copy link
Contributor Author

Thanks for reviewing. I've changed the type as you mentioned. I've also added a reveal_type test for __dataclass_fields__.

I do have one question though. I wanted to add another assertion to the tests, something like this:

from ty_extensions import static_assert, is_subtype_of
from _typeshed import DataclassInstance

@dataclass
class Foo:
    x: int

static_assert(is_subtype_of(Foo, DataclassInstance))

This assertion however, failed. I am not sure why. I grepped for any other examples of from _typeshed import ... in the mdtests, but could not find any.

@abhijeetbodas2001
Copy link
Contributor Author

Thanks for the review Alex! I've addressed your comments.

@sharkdp
Copy link
Contributor

sharkdp commented May 13, 2025

from _typeshed import DataclassInstance

@AlexWaygood might know more about why we don't use/support this, so far.

@sharkdp
Copy link
Contributor

sharkdp commented May 13, 2025

I updated the test assertion now that #17998 has been merged.

@sharkdp sharkdp merged commit 68b0386 into astral-sh:main May 13, 2025
34 checks passed
@abhijeetbodas2001
Copy link
Contributor Author

I updated the test assertion now that #17998 has been merged.

Thank you!

@abhijeetbodas2001 abhijeetbodas2001 deleted the dataclass branch May 13, 2025 08:45
@AlexWaygood
Copy link
Member

AlexWaygood commented May 13, 2025

from _typeshed import DataclassInstance

@AlexWaygood might know more about why we don't use/support this, so far.

It seems to be because we still incorrectly emit an error if the attribute is accessed on instances of dataclasses rather than dataclass class objects themselves:

@dataclass
class Foo: ...

reveal_type(Foo.__dataclass_fields__)  # revealed: dict[str, Field[Any]]

# error: Type `Foo` has no attribute `__dataclass_fields__`
reveal_type(Foo().__dataclass_fields__)  # revealed: Unknown

https://play.ty.dev/b8c107d1-bc5c-4c7b-a71f-ab5991b9c53f

@sharkdp
Copy link
Contributor

sharkdp commented May 13, 2025

It seems to be because we still incorrectly emit an error if the attribute is accessed on instances of dataclasses rather than dataclass class objects themselves:

Oh! I thought there was a problem with the _typeshed import.

@abhijeetbodas2001
Copy link
Contributor Author

It seems to be because we still incorrectly emit an error if the attribute is accessed on instances of dataclasses rather than dataclass class objects themselves:

Oh makes sense! I will dig into the code to understand more, but conceptually, why does is_subtype_of need to inspect instances of Foo? If it finds a class-var called __dataclass_fields__ on Foo itself, can it not assume that instances will also have it (and claim the subtype relation to hold)? Maybe not inside the type checker (where we synthesize stuff), but in real code, this implication should hold?

dcreager added a commit that referenced this pull request May 13, 2025
…eep-dish

* origin/main:
  [ty] __file__ is always a string inside a Python module (#18071)
  [ty] Recognize submodules in self-referential imports (#18005)
  [ty] Add a note to the diagnostic if a new builtin is used on an old Python version (#18068)
  [ty] Add tests for `else` branches of `hasattr()` narrowing (#18067)
  [ty] Improve diagnostics for `assert_type` and `assert_never` (#18050)
  [ty] contribution guide (#18061)
  [ty] Implement `DataClassInstance` protocol for dataclasses. (#18018)
  [ruff_python_ast] Fix redundant visitation of test expressions in elif clause statements (#18064)
@carljm
Copy link
Contributor

carljm commented May 13, 2025

conceptually, why does is_subtype_of need to inspect instances of Foo? If it finds a class-var called __dataclass_fields__ on Foo itself, can it not assume that instances will also have it (and claim the subtype relation to hold)? Maybe not inside the type checker (where we synthesize stuff), but in real code, this implication should hold?

Yes, this is true at runtime, and ty should understand it too. Generally we do, but in this PR we do not, because the special case for a member lookup of __dataclass_fields__ was added only for Type::ClassLiteral type, in Type::member_lookup_with_policy. Our general fallback from an instance to its class is implemented at a deeper layer, in methods on ClassType.

So to fix this, either we also need a special case for __dataclass_fields__ in the NominalInstance arm of Type::member_lookup_with_policy, or (probably better? but I haven't worked through the details) the entire handling of __dataclass_fields__ should be moved out of Type::member_lookup_with_policy and into ClassType methods.

@carljm
Copy link
Contributor

carljm commented May 13, 2025

On second look, I also think that the assertion made in the tests here is wrong: the class object Foo should not satisfy the DataclassInstance protocol, because the __dataclass_fields__ attribute on DataclassInstance is marked as ClassVar, and the type of Foo (the builtin type) has no such attribute.

@abhijeetbodas2001
Copy link
Contributor Author

I also think that the assertion made in the tests here is wrong

I did not understand this. Which assertion specifically?

Assuming we do add a synthesized member to NominalInstance like you mentioned above, the assertion: static_assert(is_subtype_of(Foo, DataclassInstance)) should pass?

I understand that the class object Foo shouldn't satisfy the protocol, but in the above check, we only care about the type Foo isn't it? What am I missing here?

@sharkdp
Copy link
Contributor

sharkdp commented May 15, 2025

I did not understand this. Which assertion specifically?

This one: static_assert(is_subtype_of(Foo, DataclassInstance)). This should not pass.

Assuming we do add a synthesized member to NominalInstance like you mentioned above, the assertion: static_assert(is_subtype_of(Foo, DataclassInstance)) should pass?

I don't think so. The members and types match, but the type qualifiers do not:

class DataclassInstance(Protocol):
    __dataclass_fields__: ClassVar[dict[str, Field[Any]]]

I'm not sure where that is specified, but I assume the ClassVar annotation here means: __dataclass_fields__ must be accessible on the type of an instance of this protocol. For instances of a dataclass, this is true:

from dataclasses import dataclass

@dataclass
class C: ...

c = C()
type(c).__dataclass_fields__  # this exists

But for class objects, this is not true:

type(C).__dataclass_fields  # this does not exist. `type(C)` is `type`.

Note that dataclass class objects can still be passed to dataclasses.fields, because they also accept type[DataclassInstance]:

https://github.com/python/typeshed/blob/1063db7c15135c172f1f6a81d3aff6d1cb00a980/stdlib/dataclasses.pyi#L299

I was already working on a fix (#18115) when I saw your message.

@abhijeetbodas2001
Copy link
Contributor Author

abhijeetbodas2001 commented May 15, 2025

I'm not sure where that is specified, but I assume the ClassVar annotation here means: dataclass_fields must be accessible on the type of an instance of this protocol.

This makes sense. I also could not find this anywhere, but is the reasoning here that, the constraints:

(1) members annotated with ClassVar cannot be set on instances
(2) for Foo to be an instance of DataclassInstance, the member must be present

.. together imply that it must be present in the type of Foo?


But thanks for the explanation and the fix, and I'm sorry that you had to rework this :(

@sharkdp
Copy link
Contributor

sharkdp commented May 15, 2025

But thanks for the explanation and the fix, and I'm sorry that you had to rework this :(

No worries! Thank you for the initial implementation.

Glyphack pushed a commit to Glyphack/ruff that referenced this pull request May 21, 2025
…sh#18018)

Fixes: astral-sh/ty#92

## Summary

We currently get a `invalid-argument-type` error when using
`dataclass.fields` on a dataclass, because we do not synthesize the
`__dataclass_fields__` member.

This PR fixes this diagnostic.

Note that we do not yet model the `Field` type correctly. After that is
done, we can assign a more precise `tuple[Field, ...]` type to this new
member.

## Test Plan
New mdtest.

---------

Co-authored-by: David Peter <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ty Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invalid argument to fields from dataclasses
5 participants