Skip to content

Dataclass support #21910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aidiss opened this issue Jul 14, 2018 · 3 comments · Fixed by #27999
Closed

Dataclass support #21910

aidiss opened this issue Jul 14, 2018 · 3 comments · Fixed by #27999
Labels
API Design Needs Discussion Requires discussion from core team before further action
Milestone

Comments

@aidiss
Copy link

aidiss commented Jul 14, 2018

Proposal description

Dataclasses were added in Python 3.7.

It would be nice for pandas to support dataclasses. For example could be possible to construct dataframe from by calling .from_dataclasses or just .DataFrame(data=dataclass_list). There should be also possibility to do .to_dataclasses.

Expected Behaviour

from dataclasses import dataclass
import pandas as pd

@dataclass
class SimpleDataObject(object):
  field_a: int
  field_b: str

dataclass_object1 = SimpleDataObject(1, 'a')
dataclass_object2 = SimpleDataObject(2, 'b')
>>> asd

# Dataclasses to DataFrame
df = pd.from_dataclasses([dataclass_object1, dataclass_object2])
df.dtypes == ['field_a', 'field_b']
>>> True
df.dtypes == ['int', 'str']
>>> True

# Dataclasses to DataFrame
df = pd.DataFrame(data=[dataclass_object1, dataclass_object2])
df.dtypes == ['field_a', 'field_b']
>>> True
df.dtypes == ['int', 'str']
>>> True

# DataFrame to Dataclasses
df = pd.DataFrame(columns=['field_a', 'field_b'], data=[[1, 'a'], [2, 'b']])
dataclass_list = df.to_dataclasses()
dataclass_list == [dataclass_object1, dataclass_object2]
>>> True
@topper-123
Copy link
Contributor

topper-123 commented Jul 15, 2018

AFAIK is s not guaranteed that you can know that a certain instance is a dataclass. E.g. Classes do not inherit from dataclass.

From your example:

@dataclass
class SimpleDataObject(object):
  field_a: int
  field_b: str

x = SimpleDataObject(a=2, b=f’)

I dont think you could even tell from introspection that x is a dataclass, correct? If that’s the case, this isn’t possible to do.

@chris-b1
Copy link
Contributor

The dataclasses module has is_dataclass and fields introspection functions, so that part shouldn't be an issue.

That said I'm not sure we should quickly commit to any specific API/support here. For now the the asdict helper from the dataclasses module can help with the ingest usecase.

In [18]: from dataclasses import asdict

In [19]: pd.DataFrame([asdict(x) for x in [dataclass_object1, dataclass_object2]])
Out[19]:
   field_a field_b
0        1       a
1        2       b

@chris-b1 chris-b1 added API Design Needs Discussion Requires discussion from core team before further action labels Jul 15, 2018
@asosnovsky
Copy link
Contributor

asosnovsky commented Aug 18, 2019

I compiled a solution where I check the data provided during __init__, in this PR, however, it looks like their testing pipeline is setup to support multiple py-versions. So I may need a bit more time to make this happen.

@jreback jreback added this to the Contributions Welcome milestone Jan 1, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.1 Feb 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants