Skip to content

Concatenating Single-element dense series with SparseArray Series Raises Error #30668

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
henighan opened this issue Jan 3, 2020 · 6 comments · Fixed by #34355
Closed

Concatenating Single-element dense series with SparseArray Series Raises Error #30668

henighan opened this issue Jan 3, 2020 · 6 comments · Fixed by #34355
Assignees
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@henighan
Copy link

henighan commented Jan 3, 2020

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np


a = pd.Series(pd.SparseArray([1, None]), dtype=np.float)
b = pd.Series([1], dtype=np.float)
pd.concat([a, b])

With interpreter output

>>>import pandas as pd
>>>import numpy as np
>>>
>>>
>>> a = pd.Series(pd.SparseArray([1, None]), dtype=np.float)
>>> b = pd.Series([1], dtype=np.float)
>>> pd.concat([a, b], axis=0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/henighan/Documents/henighan-pandas/pandas/core/reshape/concat.py", line 246, in concat
    return op.get_result()
  File "/Users/henighan/Documents/henighan-pandas/pandas/core/reshape/concat.py", line 426, in get_result
    [x._data for x in self.objs], self.new_axes
  File "/Users/henighan/Documents/henighan-pandas/pandas/core/internals/managers.py", line 1629, in concat
    values = concat_compat(values)
  File "/Users/henighan/Documents/henighan-pandas/pandas/core/dtypes/concat.py", line 117, in concat_compat
    return _concat_sparse(to_concat, axis=axis, typs=typs)
  File "/Users/henighan/Documents/henighan-pandas/pandas/core/dtypes/concat.py", line 478, in _concat_sparse
    for x in to_concat
  File "/Users/henighan/Documents/henighan-pandas/pandas/core/dtypes/concat.py", line 478, in <listcomp>
    for x in to_concat
  File "/Users/henighan/Documents/henighan-pandas/pandas/core/arrays/sparse/array.py", line 361, in __init__
    data, kind=kind, fill_value=fill_value, dtype=dtype
  File "/Users/henighan/Documents/henighan-pandas/pandas/core/arrays/sparse/array.py", line 1504, in make_sparse
    if arr.ndim > 1:
AttributeError: 'float' object has no attribute 'ndim'

Problem description

When concatenating a series of a sparse type (eg Sparse[float64, nan] above) with a serires of a dense type (eg float64 above) that has only a single element, the above exception is raised.

I believe this may be the result of the squeeze happening here:
https://github.com/henighan/pandas/blob/45d8d77f27cf0dbc8cefe932f8fb64f6982b9527/pandas/core/dtypes/concat.py#L477

If I remove this squeeze, it resolves the issue for me (yielding the output below), and all tests still pass on my mac when running ./test_fast.sh. If this seems right to others, I'd be happy to open a pull request.

Expected Output

>>> import pandas as pd
>>> import numpy as np
>>>
>>>
>>> a = pd.Series(pd.SparseArray([1, None]), dtype=np.float)
>>> b = pd.Series([1], dtype=np.float)
>>> pd.concat([a, b])
0    1.0
1    NaN
0    1.0
dtype: Sparse[float64, nan]

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

pd.show_versions()
/Users/henighan/Documents/henighan-pandas/pandas/core/index.py:29: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace.
FutureWarning,

INSTALLED VERSIONS

commit : 45d8d77
python : 3.7.6.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.26.0.dev0+1586.g45d8d77f2
numpy : 1.17.3
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 44.0.0.post20200102
Cython : 0.29.14
pytest : 5.3.2
hypothesis : 5.1.0
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.10.1
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.1
fastparquet : 0.3.2
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.1.2
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : 5.3.2
s3fs : 0.4.0
scipy : 1.4.1
sqlalchemy : 1.3.12
tables : 3.6.1
tabulate : 0.8.6
xarray : 0.14.1
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7
numba : 0.46.0

@simonjayhawkins
Copy link
Member

Thanks @henighan for the report.

This looks to be fixed on master.

>>> import numpy as np
>>> import pandas as pd
>>>
>>> pd.__version__
'1.1.0.dev0+1631.g42a5c1c1aa'
>>>
>>> a = pd.Series(pd.arrays.SparseArray([1, None]), dtype=np.float)
>>> a
0    1.0
1    NaN
dtype: Sparse[float64, nan]
>>>
>>> b = pd.Series([1], dtype=np.float)
>>> b
0    1.0
dtype: float64
>>>
>>> pd.concat([a, b])
0    1.0
1    NaN
0    1.0
dtype: Sparse[float64, nan]
>>>
>>> pd.concat([a, b], axis=0)
0    1.0
1    NaN
0    1.0
dtype: Sparse[float64, nan]
>>>

just need a test to prevent regression and close this issue.

PRs or further investigation welcome.

@simonjayhawkins simonjayhawkins added good first issue Needs Tests Unit test(s) needed to prevent regressions labels May 22, 2020
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone May 22, 2020
@ricalanis
Copy link
Contributor

take

@ricalanis
Copy link
Contributor

Added test with exactly the case you proposed, @simonjayhawkins

I wonder if we should cover any other combination of sparse/dense cases

@simonjayhawkins simonjayhawkins modified the milestones: Contributions Welcome, 1.1 May 25, 2020
@simonjayhawkins
Copy link
Member

I wonder if we should cover any other combination of sparse/dense cases

It may be worth bisecting to find which commit fixed this and what tests were added at the time.

@hadisaadat
Copy link

this error still exists and not solved yet, I just upgraded my pandas but no effect!

@simonjayhawkins
Copy link
Member

the code sample in the OP gives the expected output in the OP in latest release pandas

$ python
Python 3.8.5 (default, Aug  5 2020, 09:44:06) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import numpy as np
>>>
>>>
>>> a = pd.Series(pd.SparseArray([1, None]), dtype=np.float)
<stdin>:1: FutureWarning: The pandas.SparseArray class is deprecated and will be removed from pandas in a future version. Use pandas.arrays.Spa
rseArray instead.
>>> b = pd.Series([1], dtype=np.float)
>>> pd.concat([a, b])
0    1.0
1    NaN
0    1.0
dtype: Sparse[float64, nan]
>>> pd.__version__
'1.1.1'
>>>

@hadisaadat You may want to open a new issue (using the bug template) including a MRE of the failing case and the versions you are using.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants