Skip to content

REGR: concat of unaligned empty DataFrames failing #39037

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Jan 8, 2021 · 5 comments · Fixed by #41677
Closed

REGR: concat of unaligned empty DataFrames failing #39037

jorisvandenbossche opened this issue Jan 8, 2021 · 5 comments · Fixed by #41677
Labels
Bug Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@jorisvandenbossche
Copy link
Member

While trying to get dask's CI passing (dask/dask#6996), I noticed another error related to concat. Dask concatenates the empty "meta" dataframe to know the shape/dtypes of the resulting dataframe, and something is failing in there now.

Small reproducer without dask:

>>> df1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
>>> df2 = pd.DataFrame({'b': [1, 2, 3], 'c': [4, 5, 6]})

>>> pd.concat([df1[0:0], df2[0:0], df1[0:0]])
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-151-b1206da866d9> in <module>
----> 1 pd.concat([df1[0:0], df2[0:0], df1[0:0]])

~/scipy/pandas/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    297     )
    298 
--> 299     return op.get_result()
    300 
    301 

~/scipy/pandas/pandas/core/reshape/concat.py in get_result(self)
    518 
    519             new_data = concatenate_block_managers(
--> 520                 mgrs_indexers, self.new_axes, concat_axis=self.bm_axis, copy=self.copy
    521             )
    522             if not self.copy:

~/scipy/pandas/pandas/core/internals/concat.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
     89         else:
     90             b = make_block(
---> 91                 _concatenate_join_units(join_units, concat_axis, copy=copy),
     92                 placement=placement,
     93                 ndim=len(axes),

~/scipy/pandas/pandas/core/internals/concat.py in _concatenate_join_units(join_units, concat_axis, copy)
    325         join_units = nonempties
    326 
--> 327     empty_dtype, upcasted_na = _get_empty_dtype_and_na(join_units)
    328 
    329     to_concat = [

~/scipy/pandas/pandas/core/internals/concat.py in _get_empty_dtype_and_na(join_units)
    436 
    437     msg = "invalid dtype determination in get_concat_dtype"
--> 438     raise AssertionError(msg)
    439 
    440 

AssertionError: invalid dtype determination in get_concat_dtype

It occurs when reindexing happens (not fully aligned dataframes), and apparently at least 3 dataframes are needed to trigger it (the same example with only 2 dataframes passed to concat doesn't fail).

I suppose this might be related to #38843 (change only on master, so not a target for 1.2.1) cc @jbrockmendel

@jorisvandenbossche jorisvandenbossche added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jan 8, 2021
@jorisvandenbossche jorisvandenbossche added this to the 1.3 milestone Jan 8, 2021
@jorisvandenbossche
Copy link
Member Author

Another failing case from the dask tests:

>>> df1 = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']})
>>> df2 = pd.DataFrame({'a': [1, 2, 3]})

>>> pd.concat([df1[:0], df2[:0]]).dtypes
a      int64
b    float64
dtype: object

This now gives "float64" dtype for the result, while the original "b" column was an object dtype with strings.

@jorisvandenbossche
Copy link
Member Author

I suppose this might be related to #38843

Yes, can confirm that on my branch reverting those changes I opened earlier today (#39035), this is fixed again.

@simonjayhawkins
Copy link
Member

@jorisvandenbossche fixed in #39035?

@simonjayhawkins simonjayhawkins added the Closing Candidate May be closeable, needs more eyeballs label Jan 22, 2021
@jorisvandenbossche
Copy link
Member Author

Yes, but I still need to add tests for the cases shown above.

@jorisvandenbossche jorisvandenbossche added Needs Tests Unit test(s) needed to prevent regressions and removed Closing Candidate May be closeable, needs more eyeballs labels Jan 22, 2021
@simonjayhawkins
Copy link
Member

Yes, but I still need to add tests for the cases shown above.

sure. good first issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants