Skip to content

Split pandas package into pandas and pandas-core #57550

Closed
@datapythonista

Description

@datapythonista

Maybe worth a PDEP, but opening as an issue first to see what other people think, and see if it's needed or worth the time.

The status quo for dependencies in pandas has been to depend on numpy, pytz, and dateutil, and for everything else just make them optional. This has been working reasonably ok, but I don't think it's ideal, and the discussion on whether PyArrow should be required or optional is one example of it.

In my opinion, there are two main things to consider. The first one is about users, and I see two broad groups:

  1. The average user who will pip/conda install pandas and wants things to work without much hassle
  2. The advanced user who wants more control on what is installed

I think the current approach favors group 2, and causes users in group 1 to experience many exceptions on missing dependencies if they want to use key functionalities like .read_excel() or .plot() or have suboptimal performance if they use .read_csv() and others and miss PyArrow. Of course this is avoided if they install pandas with a distribution that includes the dependencies, which I think it's common.

There is a second thing that it's how code is structured for soft dependencies, but I will leave it out of this discussion, as it's another complex but somehow independent topic.

What I propose regarding the packaging is what many other packages do, for example R in the packaging on Linux distributions. Distribute two different packages pandas and pandas-core.

The existing package pandas would be renamed to pandas-core, and users who would want a minimal installation would be able to use it. A new metapackage would be created with the existing name pandas. It'd be a metapackage / "empty" package with just dependencies to pandas-code, pyarrow, matplotlib... and any other package we consider important to have by default.

I think this would solve in a reasonable way the discussion on whether to make PyArrow required, and in general improve the experience of most pandas users.

@pandas-dev/pandas-core thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    BuildLibrary building on various platformsDependenciesRequired and optional dependenciesIdeasLong-Term Enhancement DiscussionsNeeds DiscussionRequires discussion from core team before further actionRelease

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions