Skip to content

API: concatting DataFrames does not skip empty objects #39035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Jan 8, 2021

See discussion in #38843

To repeat the main argument: when not skipping empty objects, the resulting dtype of a concat-operation only depends on the input dtypes, and not on the exact content (the exact values, how many values (shape)). In general we want to get rid of value-dependent behaviour. In the past we discussed this in the context of the certain values (eg presence of NaNs or not), but I think also the shape should not matter (eg when slicing dataframes before contatting, you can get empties or not depending on values).

This starts with reverting the change for DataFrame. If we opt for this behaviour, in addition we should also discuss how to change the Series behaviour.

cc @jbrockmendel

@jorisvandenbossche jorisvandenbossche added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Jan 8, 2021
@jorisvandenbossche jorisvandenbossche added this to the 1.3 milestone Jan 8, 2021
@jreback
Copy link
Contributor

jreback commented Jan 8, 2021

so if we revert this, then we actually need to change real logic here and use the combined dtypes of the empties, which is fine.

@jbrockmendel
Copy link
Member

I'm generally OK with this.

If we opt for this behaviour, in addition we should also discuss how to change the Series behaviour.

If we do, we should do the same for Index setops, which has special-casing for empty indexes.

@jreback
Copy link
Contributor

jreback commented Jan 8, 2021

I'm generally OK with this.

If we opt for this behaviour, in addition we should also discuss how to change the Series behaviour.

If we do, we should do the same for Index setops, which has special-casing for empty indexes.

ok fair enought. let's revert then i guess try to unify and see what breaks.

@jreback jreback merged commit 47f5fdf into pandas-dev:master Jan 8, 2021
@jorisvandenbossche jorisvandenbossche deleted the concat-dataframe-empties branch January 12, 2021 08:05
@jorisvandenbossche
Copy link
Member Author

Opened a follow-up issue for this at #39122

luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021
)

* Revert "BUG: casting on concat with empties (pandas-dev#38907)"

This reverts commit 04282c7.

* Revert "BUG: inconsistent concat casting EA vs non-EA (pandas-dev#38843)"

This reverts commit 2362df9.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants