Skip to content

ENH: Add support for dataclasses in the DataFrame constructor #27999

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 41 commits into from
Mar 15, 2020
Merged

ENH: Add support for dataclasses in the DataFrame constructor #27999

merged 41 commits into from
Mar 15, 2020

Conversation

asosnovsky
Copy link
Contributor

@asosnovsky asosnovsky commented Aug 18, 2019

Added support for data-classes when used in the construction of a new dataframe.

  • closes Dataclass support #21910
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

Simply put, added support to use dataclasses in the following way:

from dataclasses import dataclass

@dataclass 
class Person:
    name: str
    age: int

df = DataFrame([Person("me", 25), Person("you", 35)])

@pep8speaks
Copy link

pep8speaks commented Aug 18, 2019

Hello @asosnovsky! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-03-14 23:03:57 UTC

@asosnovsky asosnovsky mentioned this pull request Aug 18, 2019
Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a release note (1.0.rst).

A few edge cases we should have tests for

  1. A mixture of dataclass types (probably takes the union of all the columns?)
  2. A mixture of dataclass and non-dataclass (this probably raises. Can we ensure that the error message is reasonable)

using make_dataclass in tests
@datapythonista datapythonista changed the title Issue/21910 ENH: Add support for dataclasses in the DataFrame constructor Aug 19, 2019
@datapythonista datapythonista added API Design Needs Discussion Requires discussion from core team before further action labels Aug 19, 2019
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments on the doc addition!

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty good, some changes, ping on green.

@@ -430,6 +430,12 @@ def _get_axes(N, K, index, columns):
return index, columns


def _dataclasses_to_dicts(data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you try to type things (you can add in a TYPE_CHECKING block if need be)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

-  renamed _dataclasses_to_dicts to dataclasses_to_dicts
- added docs to dataclasses_to_dicts
@MarcoGorelli
Copy link
Member

Hi @asosnovsky - sorry to chase you up, just wanted to ask whether you're still working on this :)

@jreback jreback added this to the 1.1 milestone Feb 9, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. some minor versionadded docs need updating. pls merge master and ping on green.

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Mar 13, 2020

Hi @asosnovsky , is this active? I'm hesitant to close this as stale as it looks like it's almost done, but we haven't heard from you in over 2 months

@jreback
Copy link
Contributor

jreback commented Mar 14, 2020

this looks fine, but needs a rebase if someone wants to merge master. ping on green.

@WillAyd
Copy link
Member

WillAyd commented Mar 14, 2020

@jreback green

@jreback jreback merged commit 6620dc6 into pandas-dev:master Mar 15, 2020
@jreback
Copy link
Contributor

jreback commented Mar 15, 2020

thanks @asosnovsky

@simonjayhawkins
Copy link
Member

Thanks @asosnovsky Very nice. a few comments to follow-up.

  1. I think we should have some mention of this enhancement a whatsnew.
  2. Dataclass support #21910 also suggested a to_dataclasses method so maybe should not be closed, or open another issue to discuss this further.
  3. Would it make sense to also add dataclass support to the Series constructor for consistency? although the field names would now be the index and dtype would be either object to maintain the field types or maybe coerced?

@simonjayhawkins simonjayhawkins removed the Needs Discussion Requires discussion from core team before further action label Mar 18, 2020
@asosnovsky
Copy link
Contributor Author

Thanks @asosnovsky Very nice. a few comments to follow-up.

  1. I think we should have some mention of this enhancement a whatsnew.

  2. Dataclass support #21910 also suggested a to_dataclasses method so maybe should not be closed, or open another issue to discuss this further.

  3. Would it make sense to also add dataclass support to the Series constructor for consistency? although the field names would now be the index and dtype would be either object to maintain the field types or maybe coerced?

Not sure if it makes sense to add a "to_dataclasses" method. Would you have to create a dataclass on the fly or feed in one?

Not sure about series either. But if we were to do that, don't series need to have a single dtype? What happens to a dataclass with a mixed bag of types ?

SeeminSyed pushed a commit to CSCD01-team01/pandas that referenced this pull request Mar 22, 2020
@mathbunnyru
Copy link

I think this feature is quite cool and worth mentioning here:
https://github.com/pandas-dev/pandas/blob/master/doc/source/whatsnew/v1.1.0.rst

@jreback
Copy link
Contributor

jreback commented Apr 9, 2020

hmm this originally did have a whats new note ; must have gotten lost

would take a patch to add one

@mathbunnyru
Copy link

@asosnovsky please do that. I'm not a native speaker and I think it'll be easier for you to describe the change :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Constructors Series/DataFrame/Index/pd.array Constructors
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dataclass support