Skip to content

BUG: read_pickle and categorical data #8518

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fkaufer opened this issue Oct 9, 2014 · 1 comment · Fixed by #8519
Closed

BUG: read_pickle and categorical data #8518

fkaufer opened this issue Oct 9, 2014 · 1 comment · Fixed by #8519
Labels
Categorical Categorical Data Type IO Data IO issues that don't fit into a more specific label
Milestone

Comments

@fkaufer
Copy link

fkaufer commented Oct 9, 2014

I can pickle dataframes with categoricals but not import them again. Since HDF is officially not supported yet: is there any working alternative for persistence of categoricals?

df = pd.DataFrame({'a':['x','y','x']})

df.to_pickle('/tmp/df.pkl')
pd.read_pickle('/tmp/df.pkl')

df['a'] = df['a'].astype('category')

df.to_pickle('/tmp/dfcat.pkl')
pd.read_pickle('/tmp/dfcat.pkl')
TypeError                                 Traceback (most recent call last)
<ipython-input-21-57b10ac045a6> in <module>()
      7 
      8 df.to_pickle('/tmp/dfcat.pkl')
----> 9 pd.read_pickle('/tmp/dfcat.pkl')

//anaconda/envs/pd15/lib/python2.7/site-packages/pandas-0.15.0rc1_10_g215569a-py2.7-macosx-10.5-x86_64.egg/pandas/io/pickle.pyc in read_pickle(path)
     58 
     59     try:
---> 60         return try_read(path)
     61     except:
     62         if PY3:

//anaconda/envs/pd15/lib/python2.7/site-packages/pandas-0.15.0rc1_10_g215569a-py2.7-macosx-10.5-x86_64.egg/pandas/io/pickle.pyc in try_read(path, encoding)
     55             except:
     56                 with open(path, 'rb') as fh:
---> 57                     return pc.load(fh, encoding=encoding, compat=True)
     58 
     59     try:

//anaconda/envs/pd15/lib/python2.7/site-packages/pandas-0.15.0rc1_10_g215569a-py2.7-macosx-10.5-x86_64.egg/pandas/compat/pickle_compat.pyc in load(fh, encoding, compat, is_verbose)
    114         up.is_verbose = is_verbose
    115 
--> 116         return up.load()
    117     except:
    118         raise

//anaconda/envs/pd15/lib/python2.7/pickle.pyc in load(self)
    856             while 1:
    857                 key = read(1)
--> 858                 dispatch[key](self)
    859         except _Stop, stopinst:
    860             return stopinst.value

//anaconda/envs/pd15/lib/python2.7/pickle.pyc in load_build(self)
   1215         setstate = getattr(inst, "__setstate__", None)
   1216         if setstate:
-> 1217             setstate(state)
   1218             return
   1219         slotstate = None

//anaconda/envs/pd15/lib/python2.7/site-packages/pandas-0.15.0rc1_10_g215569a-py2.7-macosx-10.5-x86_64.egg/pandas/core/internals.pyc in __setstate__(self, state)
   2311             self.blocks = tuple(
   2312                 unpickle_block(b['values'], b['mgr_locs'])
-> 2313                 for b in state['blocks'])
   2314         else:
   2315             # discard anything after 3rd, support beta pickling format for a

//anaconda/envs/pd15/lib/python2.7/site-packages/pandas-0.15.0rc1_10_g215569a-py2.7-macosx-10.5-x86_64.egg/pandas/core/internals.pyc in <genexpr>((b,))
   2311             self.blocks = tuple(
   2312                 unpickle_block(b['values'], b['mgr_locs'])
-> 2313                 for b in state['blocks'])
   2314         else:
   2315             # discard anything after 3rd, support beta pickling format for a

//anaconda/envs/pd15/lib/python2.7/site-packages/pandas-0.15.0rc1_10_g215569a-py2.7-macosx-10.5-x86_64.egg/pandas/core/internals.pyc in unpickle_block(values, mgr_locs)
   2303             if values.dtype == 'M8[us]':
   2304                 values = values.astype('M8[ns]')
-> 2305             return make_block(values, placement=mgr_locs)
   2306 
   2307         if (isinstance(state, tuple) and len(state) >= 4

//anaconda/envs/pd15/lib/python2.7/site-packages/pandas-0.15.0rc1_10_g215569a-py2.7-macosx-10.5-x86_64.egg/pandas/core/internals.pyc in make_block(values, placement, klass, ndim, dtype, fastpath)
   2075 
   2076     return klass(values, ndim=ndim, fastpath=fastpath,
-> 2077                  placement=placement)
   2078 
   2079 

//anaconda/envs/pd15/lib/python2.7/site-packages/pandas-0.15.0rc1_10_g215569a-py2.7-macosx-10.5-x86_64.egg/pandas/core/internals.pyc in __init__(self, values, placement, fastpath, **kwargs)
   1583         super(CategoricalBlock, self).__init__(_maybe_to_categorical(values),
   1584                                                fastpath=True, placement=placement,
-> 1585                                                **kwargs)
   1586 
   1587     @property

//anaconda/envs/pd15/lib/python2.7/site-packages/pandas-0.15.0rc1_10_g215569a-py2.7-macosx-10.5-x86_64.egg/pandas/core/internals.pyc in __init__(self, values, placement, ndim, fastpath)
   1073         # kludgetastic
   1074         if ndim is None:
-> 1075             if len(placement) != 1:
   1076                 ndim = 1
   1077             else:

TypeError: object of type 'slice' has no len()

INSTALLED VERSIONS

commit: None
python: 2.7.8.final.0
python-bits: 64
OS: Darwin
OS-release: 13.4.0
machine: x86_64
processor: i386
byteorder: little
pandas: 0.15.0rc1-10-g215569a

@jorisvandenbossche jorisvandenbossche added this to the 0.15.0 milestone Oct 9, 2014
@jorisvandenbossche jorisvandenbossche added the Categorical Categorical Data Type label Oct 9, 2014
@immerrr
Copy link
Contributor

immerrr commented Oct 9, 2014

Looking into this.

@jorisvandenbossche jorisvandenbossche added the IO Data IO issues that don't fit into a more specific label label Oct 9, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants