Skip to content

Dataframe rename issue. #4403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
halleygithub opened this issue Jul 30, 2013 · 31 comments · Fixed by #4410
Closed

Dataframe rename issue. #4403

halleygithub opened this issue Jul 30, 2013 · 31 comments · Fixed by #4410
Labels
Internals Related to non-user accessible pandas implementation Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@halleygithub
Copy link

I just upgrage from 0.11 to 0.12 version. And meet dataframe rename error caused by upgrading. (It worked well in 0.11) .

>>> df4
                 TClose      RT    TExg
STK_ID RPT_Date                        
600809 20130331   22.02  0.0454  0.0422

>>> df5
                 STK_ID  RPT_Date STK_Name  TClose
STK_ID RPT_Date                                   
600809 20120930  600809  20120930     山西汾酒   38.05
       20121231  600809  20121231     山西汾酒   41.66
       20130331  600809  20130331     山西汾酒   30.01

>>> k=pd.merge(df4, df5, how='inner', left_index=True, right_index=True)
>>> k
                 TClose_x      RT    TExg  STK_ID  RPT_Date STK_Name  TClose_y
STK_ID RPT_Date                                                               
600809 20130331     22.02  0.0454  0.0422  600809  20130331     山西汾酒     30.01

>>> k.rename(columns={'TClose_x':'TClose', 'TClose_y':'QT_Close'})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "d:\Python27\lib\site-packages\pandas\core\base.py", line 40, in __repr__
    return str(self)
  File "d:\Python27\lib\site-packages\pandas\core\base.py", line 20, in __str__
    return self.__bytes__()
  File "d:\Python27\lib\site-packages\pandas\core\base.py", line 32, in __bytes__
    return self.__unicode__().encode(encoding, 'replace')
  File "d:\Python27\lib\site-packages\pandas\core\frame.py", line 668, in __unicode__
    self.to_string(buf=buf)
  File "d:\Python27\lib\site-packages\pandas\core\frame.py", line 1556, in to_string
    formatter.to_string()
  File "d:\Python27\lib\site-packages\pandas\core\format.py", line 294, in to_string
    strcols = self._to_str_columns()
  File "d:\Python27\lib\site-packages\pandas\core\format.py", line 239, in _to_str_columns
    str_columns = self._get_formatted_column_labels()
  File "d:\Python27\lib\site-packages\pandas\core\format.py", line 435, in _get_formatted_column_labels
    dtypes = self.frame.dtypes
  File "d:\Python27\lib\site-packages\pandas\core\frame.py", line 1696, in dtypes
    return self.apply(lambda x: x.dtype)
  File "d:\Python27\lib\site-packages\pandas\core\frame.py", line 4416, in apply
    return self._apply_standard(f, axis)
  File "d:\Python27\lib\site-packages\pandas\core\frame.py", line 4491, in _apply_standard
    raise e
TypeError: ("'NoneType' object is not iterable", u'occurred at index TExg')

>>> df4.dtypes
TClose    float64
RT        float64
TExg      float64
dtype: object

>>> df5.dtypes
STK_ID       object
RPT_Date     object
STK_Name     object
TClose      float64
dtype: object
>>> 
@jreback
Copy link
Contributor

jreback commented Jul 30, 2013

can you supply a reproducible for these initial frames (e.g. a function which does it exactly)

e.g. something that can be evaled to created it because need to reproduce the unicode characters
(this is a unicode error), just happens to show up in the dtype printing

DataFrame([['foo',1.0....])

@cpcloud
Copy link
Member

cpcloud commented Jul 30, 2013

i think that's a possibly spurious raise there...it should probably be a bare raise since NoneType not being iterable is not informative

@cpcloud
Copy link
Member

cpcloud commented Jul 30, 2013

i can repro this using the above frames

@halleygithub please supply some code to create the above frames.

there's a bug in icol or BlockManager.iget

@cpcloud
Copy link
Member

cpcloud commented Jul 30, 2013

ahh duplicate TExg block somehow...

@cpcloud
Copy link
Member

cpcloud commented Jul 30, 2013

we really need to remove that raise e there that's only way i was able to figure out this was in internals

@jreback
Copy link
Contributor

jreback commented Jul 30, 2013

no that raise is correct

just str(df)

@cpcloud
Copy link
Member

cpcloud commented Jul 30, 2013

huh? the raise doesn't show the correct location of the exception because it catches everything

here's part of the traceback

/home/phillip/Documents/code/py/pandas/pandas/core/frame.pyc in dtypes(self)
   1685     @property
   1686     def dtypes(self):
-> 1687         return self.apply(lambda x: x.dtype)
   1688
   1689     def convert_objects(self, convert_dates=True, convert_numeric=False, copy=True):

/home/phillip/Documents/code/py/pandas/pandas/core/frame.pyc in apply(self, func, axis, broadcast, raw, args, **kwds)
   4397                     return self._apply_raw(f, axis)
   4398                 else:
-> 4399                     return self._apply_standard(f, axis)
   4400             else:
   4401                 return self._apply_broadcast(f, axis)

/home/phillip/Documents/code/py/pandas/pandas/core/frame.pyc in _apply_standard(self, func, axis, ignore_failures)
   4472                     # no k defined yet
   4473                     pass
-> 4474                 raise e
   4475
   4476

TypeError: ("'NoneType' object is not iterable", u'occurred at index TExg')

this doesn't tell me anything about the location of the raise except that it was somewhere in looping thru series_gen

only when i removed e did the full traceback show up

maybe there's a way to show that without removing the e...

how would it be different anyway? would the possibly caught NameError / UnboundLocalError be raised instead?

@cpcloud
Copy link
Member

cpcloud commented Jul 30, 2013

In [4]: df4 = DataFrame({'TClose': [22.02], 'RT': [0.0454], 'TExg': [0.0422]}, index=MultiIndex.from_tuples([(600809, 20130331)], names=['STK_ID', 'RPT_Date']))

In [5]: df5 = DataFrame({'STK_ID': [600809] * 3, 'RPT_Date': [20120930,20121231,20130331], 'STK_Name': [u'饡驦', u'饡驦', u'饡驦'], 'TClose': [38.05, 41.66, 30.01]},index=MultiIndex.from_tuples([(600809, 20120930
), (600809, 20121231),(600809,20130331)], names=['STK_ID', 'RPT_Date']))

In [6]: k = merge(df4,df5,how='inner',left_index=True,right_index=True)

different characters but same error results.

@cpcloud
Copy link
Member

cpcloud commented Jul 30, 2013

curiously if you type store k then restart ipython, type store -r k and then

k.rename(columns={'TClose_x':'TClose'})

the error does not show up 😠

@jreback
Copy link
Contributor

jreback commented Jul 30, 2013

I think there is a pr out there to take out the e

but regardless the apply hits the error but its really in the construction

can u post your creation example?

@cpcloud
Copy link
Member

cpcloud commented Jul 30, 2013

it's there

@cpcloud
Copy link
Member

cpcloud commented Jul 30, 2013

this seems fishy

ipdb> self.items
Index([u'RT', u'TClose', u'TExg', u'RPT_Date', u'STK_ID', u'STK_Name', u'TClose_y'], dtype=object)
ipdb> self.blocks
[ObjectBlock: [TExg], 1 x 1, dtype object, IntBlock: [RT, TClose], 2 x 1, dtype int64, FloatBlock: [RT, TClose, TExg, TClose_y], 4 x 1, dtype float64]

where is RPT_Date in the blocks?

@jreback
Copy link
Contributor

jreback commented Jul 31, 2013

@halleygithub thanks for the report
turned out to be a very subtle issue

@halleygithub
Copy link
Author

I attach the cPickle dump file of (df4, df5) here : http://ajqznkugcw.l25.yunpan.cn/lk/QnPqhJRCMdspq

So if you want, you can download it to take a check .

It seems that the issue is solved. So How can I resolve my probelm ? Can I have the latest daily development builds of the pandas windows binaries from http://pandas.pydata.org/pandas-build/dev/ ?

My application did meet several issues after upgrading and need to test one by one. Thanks.

@halleygithub
Copy link
Author

OK. I manually revise the merge.py and get thing run. Still expect binary builds. Thanks,

@jreback
Copy link
Contributor

jreback commented Jul 31, 2013

Great
periodically check back for the dev builds

@smcinerney
Copy link

Can you please add this as a known-issue in the 0.12 whatsnew? along with the DeprecationWarnings?

@cpcloud
Copy link
Member

cpcloud commented Aug 15, 2013

We can add it in the dev docs, but i'm pretty sure things are "frozen" for 0.12 stuff

@cpcloud
Copy link
Member

cpcloud commented Aug 15, 2013

Would you like to submit a pull request?

@jtratner
Copy link
Contributor

@cpcloud Could pandas do a point release? (maybe before @jreback Series' refactor).

@cpcloud
Copy link
Member

cpcloud commented Aug 15, 2013

possibly, although i'm not sure if we ever came to a consensus there

cc @y-p since he suggested it on the dev mailing list a little before 0.12 came out

i'm 👍 on doing a point release

@wesm ?

@wesm
Copy link
Member

wesm commented Aug 15, 2013

What's the status of master? Do we need to create a maintenance branch and start backporting bug fixes?

@jreback
Copy link
Contributor

jreback commented Aug 15, 2013

this is fixed in 0.12 IIRC

@jreback
Copy link
Contributor

jreback commented Aug 15, 2013

actually master at this point is ok if u really wanted to release

@smcinerney
Copy link

merge() is broken in the 0.12 macports release I got yesterday

@cpcloud
Copy link
Member

cpcloud commented Aug 15, 2013

Can you be a bit more specific than just "broken"? Please open an issue if you can.

@jtratner
Copy link
Contributor

@jreback this is not fixed in 0.12. checkout of v0.12.0 and running this (on OSX) still causes the failure described above.

@smcinerney
Copy link

I'm saying that this issue 4403 (merge breaks on indexing) is still in the 0.12 release on macports. People will hit this and at minimum it needs to documented as a known-issue in the whatsnew, or some such. I had to manually edit the changes of pull request 4410.

@jreback
Copy link
Contributor

jreback commented Aug 16, 2013

@jtratner I stand corrected this was fixed early 0.13
but remains this is actually pretty hard to reproduce
you have to do very specific things to create it

IMHO this is not worth a 0.12.1 at this point
lets figure out a timeline for 0.13

@smcinerney
Copy link

@jreback, does it not occur on (any?) df merge with a non-unique index?

Separate to the timeline for merging the fix, I'm suggesting this be noted in the 0.12 whatsnew.

@jreback
Copy link
Contributor

jreback commented Aug 16, 2013

@smcinerney no this only occurs after a merge with a non unique index after the merge that u then rename

I am not averse to posting something in the docs. though I have found that people usually just ask on so, mailing list or post an issue

since everyone is now aware I think we can respond pretty easily

(even issues that have really big and bold warnings are often ignored in the docs :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation Regression Functionality that used to work in a prior pandas version
Projects
None yet
6 participants