Skip to content

REF: define nanargminmax without values_for_argsort #37815

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 14, 2020

Conversation

jbrockmendel
Copy link
Member

No description provided.

@jreback jreback added ExtensionArray Extending pandas with custom dtypes or arrays. Refactor Internal refactoring of code labels Nov 14, 2020
@jreback jreback added this to the 1.2 milestone Nov 14, 2020
@jreback jreback merged commit 840c142 into pandas-dev:master Nov 14, 2020
@jbrockmendel jbrockmendel deleted the ref-nanargminmax branch November 14, 2020 17:51
@jorisvandenbossche
Copy link
Member

This was done explicitly this way in the original PR #27801 (@jbrockmendel you actually explicitly questioned that use, which led to some discussion about it: #27801 (comment))

This gives a big slowdown, as doing a full argsort (to then only use a single index) is more expensive as argmin or argmax.

@jbrockmendel
Copy link
Member Author

Thanks for linking back there.

But so, then we can simply expand this implmentation note in the docs to: you can either implement _values_for_argsort (and then argsort, argmin and argmax provide a default implementation based on those values), or implement argsort, argmin and argmax yourself. I don't think that is any problem?

I think a Technically Correct solution would look something like:

class ExtensionArray:
    def argsort(self, ...):
        raise NotImplementedError

    def argmax(self):
        return the_implementation_using_argsort

class VFAMixin:
    def _values_for_argsort(self):
        return the_implementation_in_master
    
    def argsort(self, ...):
        return the_implementation_in_master
    
    def argmax(self):
        return the_implementation_using_values_for_argsort

Getting there may be a hassle given backwards compatibility, but i think getting rid of the ambiguity as to values_for_argsort is worth it.

@jorisvandenbossche
Copy link
Member

What's the benefit of your proposed scheme compared to the current situation? (i.e. what is quoted, assuming we properly document this)

@jbrockmendel
Copy link
Member Author

I think its clearer. It would also make it much harder for us mess up and use _values_for_argsort in places we're not supposed to (even if we document it correctly, many a slip)

@jorisvandenbossche
Copy link
Member

But also in your scheme we have a the_implementation_using_values_for_argsort function that will use _values_for_argsort, which was exactly the purpose of nanargminmax (the version from before this PR).

@jbrockmendel
Copy link
Member Author

But it wouldn't be in the base class EA, which wouldn't have values_for_argsort at all

@jbrockmendel
Copy link
Member Author

@jorisvandenbossche if you want to revert this i wont argue the point

@jreback
Copy link
Contributor

jreback commented Nov 30, 2020

prob should open a new issue so don't forget about this @jbrockmendel

@jbrockmendel
Copy link
Member Author

opened #38201 to revert this

@jorisvandenbossche
Copy link
Member

Ah, I already did that yesterday -> #38177

I think Jeff was asking for an issue about a better solution (eg your sketch above at #37815 (comment)). Now, I think we can use the existing #27218 for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants