-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Support ExtensionArray in Groupby #20502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
```python In [1]: import pandas as pd In [2]: from cyberpandas import IPArray In [3]: df = pd.DataFrame({"A": IPArray([0, 0, 1, 2, 2]), "B": [1, 5, 1, 1, 3]}) In [4]: df Out[4]: A B 0 0.0.0.0 1 1 0.0.0.0 5 2 0.0.0.1 1 3 0.0.0.2 1 4 0.0.0.2 3 In [5]: df.groupby("A").B.mean() Out[5]: A 0.0.0.1 1 0.0.0.2 2 Name: B, dtype: int64 ```
What I have so far is relatively straightforward (surprisingly). But I'm probably missing things. Are there edge cases or other operations we should test? |
Codecov Report
@@ Coverage Diff @@
## master #20502 +/- ##
==========================================
+ Coverage 91.82% 91.84% +0.02%
==========================================
Files 152 152
Lines 49249 49249
==========================================
+ Hits 45225 45235 +10
+ Misses 4024 4014 -10
Continue to review full report at Codecov.
|
If you use |
Not easily. By the time we're wrapping up the output, we've long since converted the input to an Index. That said, once we have ExtensionIndexes, it should be a one-line change: Line 3037 in cdfce2b
|
lgtm. and another reason to make EA a base class for |
Note that right now
Out[5].index
just just anIndex
with object dtype. In the future, we could tie an Index type to an ExtensionArray type, and ensure that the extension type propagates through.