Skip to content

KeyError for crosstab on Series with same name - Still an issue in v0.23.1 #21765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kasuteru opened this issue Jul 6, 2018 · 1 comment
Closed

Comments

@kasuteru
Copy link
Contributor

kasuteru commented Jul 6, 2018

Code Sample

s1 = pd.Series([1,1,2,2,3,3], name='s')
s2 = pd.Series([1,1,1,2,2,2], name='s')

pd.crosstab(s1, s2)

Return:

Short version:

ValueError: Duplicated level name: "s", assigned to level 1, is already used for level 0.

Long version:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-66-b5484aa990df> in <module>()
      2 s2 = pd.Series([1,1,1,2,2,2], name='s')
      3 
----> 4 pd.crosstab(s1, s2)

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\reshape\pivot.py in crosstab(index, columns, values, rownames, colnames, aggfunc, margins, margins_name, dropna, normalize)
    490     table = df.pivot_table('__dummy__', index=rownames, columns=colnames,
    491                            margins=margins, margins_name=margins_name,
--> 492                            dropna=dropna, **kwargs)
    493 
    494     # Post-process

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\frame.py in pivot_table(self, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name)
   5301                            aggfunc=aggfunc, fill_value=fill_value,
   5302                            margins=margins, dropna=dropna,
-> 5303                            margins_name=margins_name)
   5304 
   5305     def stack(self, level=-1, dropna=True):

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\reshape\pivot.py in pivot_table(data, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name)
     85     # if we have a categorical
     86     grouped = data.groupby(keys, observed=False)
---> 87     agged = grouped.agg(aggfunc)
     88     if dropna and isinstance(agged, ABCDataFrame) and len(agged.columns):
     89         agged = agged.dropna(how='all')

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in aggregate(self, arg, *args, **kwargs)
   4656         axis=''))
   4657     def aggregate(self, arg, *args, **kwargs):
-> 4658         return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
   4659 
   4660     agg = aggregate

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in aggregate(self, arg, *args, **kwargs)
   4095             # grouper specific aggregations
   4096             if self.grouper.nkeys > 1:
-> 4097                 return self._python_agg_general(arg, *args, **kwargs)
   4098             else:
   4099 

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in _python_agg_general(self, func, *args, **kwargs)
   1086                 output[name] = self._try_cast(values[mask], result)
   1087 
-> 1088         return self._wrap_aggregated_output(output)
   1089 
   1090     def _wrap_applied_output(self, *args, **kwargs):

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in _wrap_aggregated_output(self, output, names)
   4728             result = result._consolidate()
   4729         else:
-> 4730             index = self.grouper.result_index
   4731             result = DataFrame(output, index=index, columns=output_keys)
   4732 

pandas/_libs/properties.pyx in pandas._libs.properties.CachedProperty.__get__()

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in result_index(self)
   2379                             labels=labels,
   2380                             verify_integrity=False,
-> 2381                             names=self.names)
   2382         return result
   2383 

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\indexes\multi.py in __new__(cls, levels, labels, sortorder, names, dtype, copy, name, verify_integrity, _set_identity)
    230         if names is not None:
    231             # handles name validation
--> 232             result._set_names(names)
    233 
    234         if sortorder is not None:

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\indexes\multi.py in _set_names(self, names, level, validate)
    693                         'Duplicated level name: "{}", assigned to '
    694                         'level {}, is already used for level '
--> 695                         '{}.'.format(name, l, used[name]))
    696 
    697             self.levels[l].rename(name, inplace=True)

ValueError: Duplicated level name: "s", assigned to level 1, is already used for level 0.

Problem description

#6319 supposedly fixed this issue, yet it still persists in my configuration.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.1
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: 0.9.0
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.5
feather: 0.4.0
matplotlib: 2.1.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jschendel
Copy link
Member

I think this was reintroduced when we disallowed duplicate index level names, and has since been re-allowed by #21423.

This fix is included in 0.23.2, which was just released:

In [2]: pd.__version__
Out[2]: '0.23.2'

In [3]: s1 = pd.Series([1,1,2,2,3,3], name='s')

In [4]: s2 = pd.Series([1,1,1,2,2,2], name='s')

In [5]: pd.crosstab(s1, s2)
Out[5]: 
s  1  2
s      
1  3  0
2  0  3

@jschendel jschendel added this to the No action milestone Jul 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants