-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: efficient argmax/argmin for SparseArray #47779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pr @GYHHAHA! Some comments below, general question is why can't we just dispatch to a SparseArray.argmin/argmax
which directly uses np.argmin/max
instead of having to modify the general ExtensionArray
path (for example, how SparseArray
handles min/max
. This could also avoid materializing the data.
@mzeitlin11 I think you are right. Originally I want to reuse some general codes. It seems to be unnecessary. I will handle this in |
I find the current implementation for >>> arr = SparseArray([np.nan, 1, 0, 0, np.nan, 2], fill_value=1)
>>> arr._first_fill_value_loc()
5 # should be 1 I will commit a fix after merging this one. |
I'm not entirely sure what that function is meant to do, but the name might also just be misleading. It looks to only be used in |
The reason why unique need to find the loc is to keep the insertion place right, which aligns the normal array.unique() result. @mzeitlin11 |
@GYHHAHA what's the status here? This generally lgtm - but does the |
How about I submit the fix PR for |
Sounds great, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @GYHHAHA! LGTM pending one small comment. Can you please also show latest benchmarks for the cases you showed in the pr description?
@mzeitlin11 New benchmark is added in the description. |
Thanks @GYHHAHA |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Thanks for the review. Currently, only argmax/argmin are implemented since argsort has many annoying corner cases and I have to spent more time on the correctness and take the follow-up in a separate PR. Simple benchmark: