Skip to content

Concatenating Single-element dense series with SparseArray Series Raises Error #30668

Closed
@henighan

Description

@henighan

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np


a = pd.Series(pd.SparseArray([1, None]), dtype=np.float)
b = pd.Series([1], dtype=np.float)
pd.concat([a, b])

With interpreter output

>>>import pandas as pd
>>>import numpy as np
>>>
>>>
>>> a = pd.Series(pd.SparseArray([1, None]), dtype=np.float)
>>> b = pd.Series([1], dtype=np.float)
>>> pd.concat([a, b], axis=0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/henighan/Documents/henighan-pandas/pandas/core/reshape/concat.py", line 246, in concat
    return op.get_result()
  File "/Users/henighan/Documents/henighan-pandas/pandas/core/reshape/concat.py", line 426, in get_result
    [x._data for x in self.objs], self.new_axes
  File "/Users/henighan/Documents/henighan-pandas/pandas/core/internals/managers.py", line 1629, in concat
    values = concat_compat(values)
  File "/Users/henighan/Documents/henighan-pandas/pandas/core/dtypes/concat.py", line 117, in concat_compat
    return _concat_sparse(to_concat, axis=axis, typs=typs)
  File "/Users/henighan/Documents/henighan-pandas/pandas/core/dtypes/concat.py", line 478, in _concat_sparse
    for x in to_concat
  File "/Users/henighan/Documents/henighan-pandas/pandas/core/dtypes/concat.py", line 478, in <listcomp>
    for x in to_concat
  File "/Users/henighan/Documents/henighan-pandas/pandas/core/arrays/sparse/array.py", line 361, in __init__
    data, kind=kind, fill_value=fill_value, dtype=dtype
  File "/Users/henighan/Documents/henighan-pandas/pandas/core/arrays/sparse/array.py", line 1504, in make_sparse
    if arr.ndim > 1:
AttributeError: 'float' object has no attribute 'ndim'

Problem description

When concatenating a series of a sparse type (eg Sparse[float64, nan] above) with a serires of a dense type (eg float64 above) that has only a single element, the above exception is raised.

I believe this may be the result of the squeeze happening here:
https://github.com/henighan/pandas/blob/45d8d77f27cf0dbc8cefe932f8fb64f6982b9527/pandas/core/dtypes/concat.py#L477

If I remove this squeeze, it resolves the issue for me (yielding the output below), and all tests still pass on my mac when running ./test_fast.sh. If this seems right to others, I'd be happy to open a pull request.

Expected Output

>>> import pandas as pd
>>> import numpy as np
>>>
>>>
>>> a = pd.Series(pd.SparseArray([1, None]), dtype=np.float)
>>> b = pd.Series([1], dtype=np.float)
>>> pd.concat([a, b])
0    1.0
1    NaN
0    1.0
dtype: Sparse[float64, nan]

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

pd.show_versions()
/Users/henighan/Documents/henighan-pandas/pandas/core/index.py:29: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace.
FutureWarning,

INSTALLED VERSIONS

commit : 45d8d77
python : 3.7.6.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.26.0.dev0+1586.g45d8d77f2
numpy : 1.17.3
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 44.0.0.post20200102
Cython : 0.29.14
pytest : 5.3.2
hypothesis : 5.1.0
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.10.1
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.1
fastparquet : 0.3.2
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.1.2
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : 5.3.2
s3fs : 0.4.0
scipy : 1.4.1
sqlalchemy : 1.3.12
tables : 3.6.1
tabulate : 0.8.6
xarray : 0.14.1
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7
numba : 0.46.0

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions