Skip to content

[BUG] fixed DateOffset pickle bug when months >= 12 #35258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Aug 14, 2020

Conversation

fujiaxiang
Copy link
Member

@fujiaxiang
Copy link
Member Author

This issue is caused by the __setstate__ method defined in class RelativeDeltaOffset. The method tries to update the attributes of the loaded object using attributes of its underlying relativedelta object (i.e. its ._offset attribute). This in my opinion, correct me if I'm wrong, seems to be unnecessary and may be a legacy resulted from version changes.

However, this change does cause 1 existing test (test_pickle_v0_15_2) to fail. Again it appears to be a legacy issue and I have updated and renamed the test.

Reason why the test failed:
The test compares reconstructed objects with objects loaded from a stored pickle file pandas/tests/tseries/offsets/data/dateoffset_0_15_2.pickle. This file was produced with an older version code. When calling __setstate__, the state argument is given as follows:

{'normalize': False, '_offset': relativedelta(years=+1), '_use_relativedelta': True, 'kwds': {'years': 1}, 'n': 1}

which is different from that of current version

{'_offset': relativedelta(years=+1), '_use_relativedelta': True, 'years': 1, 'n': 1, 'normalize': False}

Note the difference in how years is represented in state.

Before this PR, the test was still ok because __setstate__`` updates the yearsattribute (withsetattr) using values from the underlying ._offset` object. With this PR it no longer does so, therefore the test failed.

@jreback jreback added Compat pandas objects compatability with Numpy or Python functions Frequency DateOffsets labels Jul 13, 2020
@simonjayhawkins
Copy link
Member

@fujiaxiang can you move release note to 1.2

@fujiaxiang
Copy link
Member Author

@simonjayhawkins have moved whatsnew to 1.2. Thanks!

@fujiaxiang
Copy link
Member Author

@jreback I have used pandas/tests/io/generate_legacy_storage_files.py to generate `pandas/tests/io/data/legacy_pickle/1.1.0/1.1.0_x86_64_darwin_3.8.5.pickle', as suggested in earlier comment.

There is an issue with the pickle protocol compatibility. I am using python 3.8.5 in my local environment. When generating the pickle file, the script automatically select the highest protocol which is 5 in python 3.8. However, this protocol is not supported in python 3.7 and below, thus causing some of the tests to fail in CI. In my local env these tests are passed.

What's your suggestion here? Should I use an environment of python 3.7 to generate the file? Another option is update the generate_legacy_storage_files.py script so that it uses pickle.DEFAULT_PROTOCOL (4 in python 3.8) for better compatibility for all future cases. What do you think?

@jreback
Copy link
Contributor

jreback commented Aug 8, 2020

@fujiaxiang you have to run with older versions of pandas (e.g. 0.25.3 or 1.0) and then commit the results to master; not avverse to running with 1.1 as well

ok on using defaulting the pickle argument as well (but that's separate)

@fujiaxiang
Copy link
Member Author

@jreback Thanks, I have updated the script to use default protocol. Also I am running with the released 1.1 pandas to generate the file.

btw I updated the comment on how to run that script because there is now a folder pandas/tests/io/json in the project, and the old way of running the script will import this folder when executing import json and causes error.

@jreback jreback added this to the 1.2 milestone Aug 13, 2020
@jreback jreback requested a review from jbrockmendel August 13, 2020 18:14
@jreback
Copy link
Contributor

jreback commented Aug 13, 2020

@jbrockmendel if any comments.

@jbrockmendel
Copy link
Member

LGTM

@jreback jreback merged commit 3989493 into pandas-dev:master Aug 14, 2020
@jreback
Copy link
Contributor

jreback commented Aug 14, 2020

thanks @fujiaxiang very nice!

@fujiaxiang fujiaxiang deleted the dateoffset_pickle_bug branch March 20, 2021 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Frequency DateOffsets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: DateOffset pickle bug when months=12
4 participants