-
Notifications
You must be signed in to change notification settings - Fork 21
DISC: pd.DataFrame methods we specifically _don't_ want included #83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks @jbrockmendel, that is a very good question. My handwavy suggestion would be "anything that's not core to the data structure or methods to manipulate it or do basic computations on it" .Your list makes complete sense to me, and I'd add:
The above is all my sense of "not core". For a different reason I'd add |
Thanks! Largely agree with all the suggestions so far. Just one comment:
I think windowing, and in particular, grouped-window functions, should stay. It's core to timeseries analysis and not easy for users to work around or implement themselves. |
Often these are used as groupby and window functions, where other libraries typically don't have grouped implementations and calling them per group leads to very subpar performance. I'm still a +1 on removing them at least from the v1 of the API though 😄. |
Thanks for the details on grouped window functions, interesting. So it seems like an important topic to deal with at some point. It feels like something can be made more composable there - I had a quick look at how it's implemented in Pandas. For example class DataFrame:
def corr(...):
elif method == "spearman":
correl = libalgos.nancorr_spearman(mat, minp=min_periods) And def nancorr_spearman(ndarray[float64_t, ndim=2] mat, Py_ssize_t minp=1) -> ndarray: So that seems generalizable to any callable and correlation metric. Same for if it would take some object that is the result of calling Windowing is a large topic (and has a large API surface), maybe worth splitting off into its own issue? |
We had a little brainstorm on functions not to include in a call a few weeks ago. Here are some notes on what was discussed regarding APIs that would be good to exclude:
|
Maybe controversial, but should |
Definitely want to avoid overloading |
I think everything here has been addressed (or rather, kept out), so I think this can be closed, do let me know if I've misunderstood |
#3 has a discussion of which pd.DataFrame methods should be included in the standard based on a) measures of popularity among users and b) which are common across dataframe libraries. I'd like to try coming at it from the other direction: what existing pd.DataFrame methods can we exclude from consideration?
Throwing out some ideas here, none of these are strong opinions on my part:
The text was updated successfully, but these errors were encountered: