Split pandas package into pandas and pandas-core

Maybe worth a PDEP, but opening as an issue first to see what other people think, and see if it's needed or worth the time.

The status quo for dependencies in pandas has been to depend on numpy, pytz, and dateutil, and for everything else just make them optional. This has been working reasonably ok, but I don't think it's ideal, and [the discussion](https://github.com/pandas-dev/pandas/issues/57424) on whether PyArrow should be required or optional is one example of it.

In my opinion, there are two main things to consider. The first one is about users, and I see two broad groups:
1. The average user who will `pip/conda install pandas` and wants things to work without much hassle
2. The advanced user who wants more control on what is installed

I think the current approach favors group 2, and causes users in group 1 to experience many exceptions on missing dependencies if they want to use key functionalities like `.read_excel()` or `.plot()` or have suboptimal performance if they use `.read_csv()` and others and miss PyArrow. Of course this is avoided if they install pandas with a distribution that includes the dependencies, which I think it's common.

There is a second thing that it's how code is structured for soft dependencies, but I will leave it out of this discussion, as it's another complex but somehow independent topic.

What I propose regarding the packaging is what many other packages do, for example R in the packaging on Linux distributions. Distribute two different packages `pandas` and `pandas-core`.

The existing package `pandas` would be renamed to `pandas-core`, and users who would want a minimal installation would be able to use it. A new metapackage would be created with the existing name `pandas`. It'd be a metapackage / "empty" package with just dependencies to `pandas-code`, `pyarrow`, `matplotlib`... and any other package we consider important to have by default.

I think this would solve in a reasonable way the discussion on whether to make PyArrow required, and in general improve the experience of most pandas users.

@pandas-dev/pandas-core thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Split pandas package into pandas and pandas-core #57550

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Split pandas package into pandas and pandas-core #57550

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions