KeyError for crosstab on Series with same name - Still an issue in v0.23.1 #21765

kasuteru · 2018-07-06T15:05:37Z

Code Sample

s1 = pd.Series([1,1,2,2,3,3], name='s')
s2 = pd.Series([1,1,1,2,2,2], name='s')

pd.crosstab(s1, s2)

Return:

Short version:

ValueError: Duplicated level name: "s", assigned to level 1, is already used for level 0.

Long version:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-66-b5484aa990df> in <module>()
      2 s2 = pd.Series([1,1,1,2,2,2], name='s')
      3 
----> 4 pd.crosstab(s1, s2)

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\reshape\pivot.py in crosstab(index, columns, values, rownames, colnames, aggfunc, margins, margins_name, dropna, normalize)
    490     table = df.pivot_table('__dummy__', index=rownames, columns=colnames,
    491                            margins=margins, margins_name=margins_name,
--> 492                            dropna=dropna, **kwargs)
    493 
    494     # Post-process

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\frame.py in pivot_table(self, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name)
   5301                            aggfunc=aggfunc, fill_value=fill_value,
   5302                            margins=margins, dropna=dropna,
-> 5303                            margins_name=margins_name)
   5304 
   5305     def stack(self, level=-1, dropna=True):

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\reshape\pivot.py in pivot_table(data, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name)
     85     # if we have a categorical
     86     grouped = data.groupby(keys, observed=False)
---> 87     agged = grouped.agg(aggfunc)
     88     if dropna and isinstance(agged, ABCDataFrame) and len(agged.columns):
     89         agged = agged.dropna(how='all')

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in aggregate(self, arg, *args, **kwargs)
   4656         axis=''))
   4657     def aggregate(self, arg, *args, **kwargs):
-> 4658         return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
   4659 
   4660     agg = aggregate

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in aggregate(self, arg, *args, **kwargs)
   4095             # grouper specific aggregations
   4096             if self.grouper.nkeys > 1:
-> 4097                 return self._python_agg_general(arg, *args, **kwargs)
   4098             else:
   4099 

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in _python_agg_general(self, func, *args, **kwargs)
   1086                 output[name] = self._try_cast(values[mask], result)
   1087 
-> 1088         return self._wrap_aggregated_output(output)
   1089 
   1090     def _wrap_applied_output(self, *args, **kwargs):

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in _wrap_aggregated_output(self, output, names)
   4728             result = result._consolidate()
   4729         else:
-> 4730             index = self.grouper.result_index
   4731             result = DataFrame(output, index=index, columns=output_keys)
   4732 

pandas/_libs/properties.pyx in pandas._libs.properties.CachedProperty.__get__()

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in result_index(self)
   2379                             labels=labels,
   2380                             verify_integrity=False,
-> 2381                             names=self.names)
   2382         return result
   2383 

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\indexes\multi.py in __new__(cls, levels, labels, sortorder, names, dtype, copy, name, verify_integrity, _set_identity)
    230         if names is not None:
    231             # handles name validation
--> 232             result._set_names(names)
    233 
    234         if sortorder is not None:

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\indexes\multi.py in _set_names(self, names, level, validate)
    693                         'Duplicated level name: "{}", assigned to '
    694                         'level {}, is already used for level '
--> 695                         '{}.'.format(name, l, used[name]))
    696 
    697             self.levels[l].rename(name, inplace=True)

ValueError: Duplicated level name: "s", assigned to level 1, is already used for level 0.

Problem description

#6319 supposedly fixed this issue, yet it still persists in my configuration.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.1
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: 0.9.0
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.5
feather: 0.4.0
matplotlib: 2.1.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jschendel · 2018-07-06T16:29:53Z

I think this was reintroduced when we disallowed duplicate index level names, and has since been re-allowed by #21423.

This fix is included in 0.23.2, which was just released:

In [2]: pd.__version__
Out[2]: '0.23.2'

In [3]: s1 = pd.Series([1,1,2,2,3,3], name='s')

In [4]: s2 = pd.Series([1,1,1,2,2,2], name='s')

In [5]: pd.crosstab(s1, s2)
Out[5]: 
s  1  2
s      
1  3  0
2  0  3

kasuteru mentioned this issue Jul 6, 2018

KeyError for crosstab on Series with same name. #6319

Closed

jschendel closed this as completed Jul 6, 2018

jschendel added this to the No action milestone Jul 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError for crosstab on Series with same name - Still an issue in v0.23.1 #21765

KeyError for crosstab on Series with same name - Still an issue in v0.23.1 #21765

kasuteru commented Jul 6, 2018 •

edited by TomAugspurger

Loading

INSTALLED VERSIONS

jschendel commented Jul 6, 2018

KeyError for crosstab on Series with same name - Still an issue in v0.23.1 #21765

KeyError for crosstab on Series with same name - Still an issue in v0.23.1 #21765

Comments

kasuteru commented Jul 6, 2018 • edited by TomAugspurger Loading

Code Sample

Problem description

Output of pd.show_versions()

INSTALLED VERSIONS

jschendel commented Jul 6, 2018

kasuteru commented Jul 6, 2018 •

edited by TomAugspurger

Loading

Output of `pd.show_versions()`