Skip to content

ENH: At top-level dataframe function for single-dispatched construction of a DataFrame #34799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
TomAugspurger opened this issue Jun 15, 2020 · 4 comments
Labels
Closing Candidate May be closeable, needs more eyeballs Constructors Series/DataFrame/Index/pd.array Constructors Enhancement Needs Discussion Requires discussion from core team before further action

Comments

@TomAugspurger
Copy link
Contributor

Is your feature request related to a problem?

This might help with two things

  1. A coordination point for 3rd-party libraries creating objects they'd like to turn into DataFrames, and users of those libraries
  2. Possibly, simplification of DataFrame.__init__

Describe the solution you'd like

A new top-level pd.dataframe function.

def dataframe(data: Any, index: Index, columns: Index, copy: bool = False):
    """
    Create a pandas DataFrame from data.
    """

@singledispatch.register(np.ndarray)
def dataframe(...):
    pass

API breaking implications

None

Describe alternatives you've considered

xref #32844. Which attempted this for DataFrame.__init__. That was a non-starter since it exposed our internal BlockManager too publicly. #32844 (comment). So we'd need to do this on a top-level function instead.

@TomAugspurger TomAugspurger added Enhancement Needs Discussion Requires discussion from core team before further action Constructors Series/DataFrame/Index/pd.array Constructors labels Jun 15, 2020
@simonjayhawkins
Copy link
Member

xref #32908 for alternative

@jbrockmendel
Copy link
Member

Just checking if I understand the idea:

Downstream library Foo has a class ModelData with something like a to_frame() method and by writing

@singledispatch.register(ModelData)
def dataframe(model_data, ...):
    return model_data.to_frame(...)

they make pd.dataframe Just Work on ModelData objects?

@TomAugspurger
Copy link
Contributor Author

Yep. functools.singledispatch looks at the type of the first argument and dispatches off that (with a fallback default if desired). So when pd.dataframe encounters a model_data, it would call the function registered for it (which would be expected to return an initialized pandas DataFrame.

@jbrockmendel
Copy link
Member

The experience with _constructor has soured me on the cost/benefit tradeoff of adding customization hooks for downstream libraries

@jbrockmendel jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs Constructors Series/DataFrame/Index/pd.array Constructors Enhancement Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

3 participants