ENH: Support ExtensionArray in Groupby #20502

TomAugspurger · 2018-03-27T15:54:39Z

In [1]: import pandas as pd

In [2]: from cyberpandas import IPArray

In [3]: df = pd.DataFrame({"A": IPArray([0, 0, 1, 2, 2]), "B": [1, 5, 1, 1, 3]})

In [4]: df
Out[4]:
         A  B
0  0.0.0.0  1
1  0.0.0.0  5
2  0.0.0.1  1
3  0.0.0.2  1
4  0.0.0.2  3

In [5]: df.groupby("A").B.mean()
Out[5]:
A
0.0.0.1    1
0.0.0.2    2
Name: B, dtype: int64

Note that right now Out[5].index just just an Index with object dtype. In the future, we could tie an Index type to an ExtensionArray type, and ensure that the extension type propagates through.

```python In [1]: import pandas as pd In [2]: from cyberpandas import IPArray In [3]: df = pd.DataFrame({"A": IPArray([0, 0, 1, 2, 2]), "B": [1, 5, 1, 1, 3]}) In [4]: df Out[4]: A B 0 0.0.0.0 1 1 0.0.0.0 5 2 0.0.0.1 1 3 0.0.0.2 1 4 0.0.0.2 3 In [5]: df.groupby("A").B.mean() Out[5]: A 0.0.0.1 1 0.0.0.2 2 Name: B, dtype: int64 ```

TomAugspurger · 2018-03-27T15:55:25Z

What I have so far is relatively straightforward (surprisingly). But I'm probably missing things. Are there edge cases or other operations we should test?

codecov · 2018-03-27T17:23:15Z

Codecov Report

Merging #20502 into master will increase coverage by 0.02%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #20502      +/-   ##
==========================================
+ Coverage   91.82%   91.84%   +0.02%     
==========================================
  Files         152      152              
  Lines       49249    49249              
==========================================
+ Hits        45225    45235      +10     
+ Misses       4024     4014      -10

Flag	Coverage Δ
#multiple	`90.23% <100%> (+0.02%)`	⬆️
#single	`41.89% <100%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/groupby.py	`92.55% <100%> (ø)`	⬆️
pandas/util/testing.py	`84.52% <0%> (-0.21%)`	⬇️
pandas/plotting/_converter.py	`66.81% <0%> (+1.73%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 766a480...98a3a85. Read the comment docs.

jorisvandenbossche · 2018-03-27T22:01:43Z

If you use as_index=False, can you in that way ensure it keeps the correct extension dtype?

TomAugspurger · 2018-03-28T01:54:55Z

Not easily. By the time we're wrapping up the output, we've long since converted the input to an Index.

That said, once we have ExtensionIndexes, it should be a one-line change:

pandas/pandas/core/groupby.py

Line 3037 in cdfce2b

uniques = Index(uniques, name=self.name)

jreback · 2018-03-28T10:35:31Z

lgtm. and another reason to make EA a base class for Index.

xref pandas-dev#20502

xref #20502

xref pandas-dev#20502

TomAugspurger added Groupby ExtensionArray Extending pandas with custom dtypes or arrays. labels Mar 27, 2018

TomAugspurger added this to the 0.23.0 milestone Mar 27, 2018

TomAugspurger added 2 commits March 27, 2018 11:06

REF: Reuse in factorize

e3fed38

Test relies on ordered dictionaries

98a3a85

TomAugspurger mentioned this pull request Mar 27, 2018

Fixed factorize for MACArray ContinuumIO/cyberpandas#13

Merged

jreback merged commit 9b4d0f1 into pandas-dev:master Mar 28, 2018

jreback added a commit to jreback/pandas that referenced this pull request Mar 29, 2018

COMPAT: 32-bit compat for testing

5368191

xref pandas-dev#20502

jreback mentioned this pull request Mar 29, 2018

COMPAT: 32-bit compat for testing #20528

Merged

javadnoorb pushed a commit to javadnoorb/pandas that referenced this pull request Mar 29, 2018

ENH: Support ExtensionArray in Groupby (pandas-dev#20502)

3549eca

jreback added a commit to jreback/pandas that referenced this pull request Mar 30, 2018

COMPAT: 32-bit compat for testing

09a46f5

xref pandas-dev#20502

jreback added a commit that referenced this pull request Mar 30, 2018

COMPAT: 32-bit compat for testing (#20528)

63a662d

xref #20502

dworvos pushed a commit to dworvos/pandas that referenced this pull request Apr 2, 2018

ENH: Support ExtensionArray in Groupby (pandas-dev#20502)

79dda7a

kornilova203 pushed a commit to kornilova203/pandas that referenced this pull request Apr 23, 2018

ENH: Support ExtensionArray in Groupby (pandas-dev#20502)

301306d

kornilova203 pushed a commit to kornilova203/pandas that referenced this pull request Apr 23, 2018

COMPAT: 32-bit compat for testing (pandas-dev#20528)

1fbe44c

xref pandas-dev#20502

TomAugspurger deleted the ea-groupby-3 branch May 2, 2018 13:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Support ExtensionArray in Groupby #20502

ENH: Support ExtensionArray in Groupby #20502

TomAugspurger commented Mar 27, 2018 •

edited

Loading

TomAugspurger commented Mar 27, 2018

codecov bot commented Mar 27, 2018 •

edited

Loading

jorisvandenbossche commented Mar 27, 2018

TomAugspurger commented Mar 28, 2018

jreback commented Mar 28, 2018

ENH: Support ExtensionArray in Groupby #20502

ENH: Support ExtensionArray in Groupby #20502

Conversation

TomAugspurger commented Mar 27, 2018 • edited Loading

TomAugspurger commented Mar 27, 2018

codecov bot commented Mar 27, 2018 • edited Loading

Codecov Report

jorisvandenbossche commented Mar 27, 2018

TomAugspurger commented Mar 28, 2018

jreback commented Mar 28, 2018

TomAugspurger commented Mar 27, 2018 •

edited

Loading

codecov bot commented Mar 27, 2018 •

edited

Loading