Description
Code Sample, a copy-pastable example if possible
import pandas as pd
import numpy as np
a = pd.Series(pd.SparseArray([1, None]), dtype=np.float)
b = pd.Series([1], dtype=np.float)
pd.concat([a, b])
With interpreter output
>>>import pandas as pd
>>>import numpy as np
>>>
>>>
>>> a = pd.Series(pd.SparseArray([1, None]), dtype=np.float)
>>> b = pd.Series([1], dtype=np.float)
>>> pd.concat([a, b], axis=0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/henighan/Documents/henighan-pandas/pandas/core/reshape/concat.py", line 246, in concat
return op.get_result()
File "/Users/henighan/Documents/henighan-pandas/pandas/core/reshape/concat.py", line 426, in get_result
[x._data for x in self.objs], self.new_axes
File "/Users/henighan/Documents/henighan-pandas/pandas/core/internals/managers.py", line 1629, in concat
values = concat_compat(values)
File "/Users/henighan/Documents/henighan-pandas/pandas/core/dtypes/concat.py", line 117, in concat_compat
return _concat_sparse(to_concat, axis=axis, typs=typs)
File "/Users/henighan/Documents/henighan-pandas/pandas/core/dtypes/concat.py", line 478, in _concat_sparse
for x in to_concat
File "/Users/henighan/Documents/henighan-pandas/pandas/core/dtypes/concat.py", line 478, in <listcomp>
for x in to_concat
File "/Users/henighan/Documents/henighan-pandas/pandas/core/arrays/sparse/array.py", line 361, in __init__
data, kind=kind, fill_value=fill_value, dtype=dtype
File "/Users/henighan/Documents/henighan-pandas/pandas/core/arrays/sparse/array.py", line 1504, in make_sparse
if arr.ndim > 1:
AttributeError: 'float' object has no attribute 'ndim'
Problem description
When concatenating a series of a sparse type (eg Sparse[float64, nan]
above) with a serires of a dense type (eg float64
above) that has only a single element, the above exception is raised.
I believe this may be the result of the squeeze
happening here:
https://github.com/henighan/pandas/blob/45d8d77f27cf0dbc8cefe932f8fb64f6982b9527/pandas/core/dtypes/concat.py#L477
If I remove this squeeze
, it resolves the issue for me (yielding the output below), and all tests still pass on my mac when running ./test_fast.sh
. If this seems right to others, I'd be happy to open a pull request.
Expected Output
>>> import pandas as pd
>>> import numpy as np
>>>
>>>
>>> a = pd.Series(pd.SparseArray([1, None]), dtype=np.float)
>>> b = pd.Series([1], dtype=np.float)
>>> pd.concat([a, b])
0 1.0
1 NaN
0 1.0
dtype: Sparse[float64, nan]
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
pd.show_versions()
/Users/henighan/Documents/henighan-pandas/pandas/core/index.py:29: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace.
FutureWarning,
INSTALLED VERSIONS
commit : 45d8d77
python : 3.7.6.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 0.26.0.dev0+1586.g45d8d77f2
numpy : 1.17.3
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 44.0.0.post20200102
Cython : 0.29.14
pytest : 5.3.2
hypothesis : 5.1.0
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.10.1
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.1
fastparquet : 0.3.2
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.1.2
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : 5.3.2
s3fs : 0.4.0
scipy : 1.4.1
sqlalchemy : 1.3.12
tables : 3.6.1
tabulate : 0.8.6
xarray : 0.14.1
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7
numba : 0.46.0