Skip to content

URGENT!!! options data garbled and unusable #193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aisthesis opened this issue Apr 4, 2016 · 11 comments
Closed

URGENT!!! options data garbled and unusable #193

aisthesis opened this issue Apr 4, 2016 · 11 comments

Comments

@aisthesis
Copy link

When using pandas_datareader.data.Options to get options data from yahoo, I'm only getting back a small selection even though the Yahoo! charts on the website contain the data I need. For example:

>>> from pandas_datareader.data import Options
>>> tsla = Options('tsla', 'yahoo')
>>> data = tsla.get_all_data()
>>> data.shape
(1064, 13)
>>> data.index.levels[1]
DatetimeIndex(['2008-04-16', '2015-04-16', '2016-09-16', '2017-06-16',
           '2019-01-18', '2020-01-17', '2020-05-16', '2022-04-16',
           '2029-04-16'],
          dtype='datetime64[ns]', name='Expiry', freq=None)
>>> data.index.levels[1].shape
(9,)

Yahoo finance, however, shows far more expiration dates, such as '2016-06-17'. Maybe it's something Yahoo is doing, but it's also disconcerting for me, since I'm getting this right after upgrading pandas_datareader to the latest version.

@aisthesis aisthesis changed the title Retrieving partial options data options data retrieved only partially Apr 4, 2016
@aisthesis
Copy link
Author

I've figured out what is happening. It's a de-serialization issue. The day of the month is being interpreted as the year, and the year as the day of the month. Maybe Yahoo! changed something. Continuing from above example:

>>> from pandas_datareader.data import Options
>>> tsla = Options('tsla', 'yahoo')
>>> data = tsla.get_all_data()
>>> data.index.levels[1]
DatetimeIndex(['2008-04-16', '2015-04-16', '2016-09-16', '2017-06-16',
           '2019-01-18', '2020-01-17', '2020-05-16', '2022-04-16',
           '2029-04-16'],
          dtype='datetime64[ns]', name='Expiry', freq=None)
>>> data.index.levels[1][0]
Timestamp('2008-04-16 00:00:00')
>>> data.index.levels[1][0].to_datetime()
datetime.datetime(2008, 4, 16, 0, 0)

Note that 2008-04-16 is not a valid expiry at the moment. All expiries for currently available options are in the future. But if we interpret the last 2 digits of the year as the day and the day given above as the year, we get a list of currently valid expiries. The days are always 16, 17 or 18. The weeklies are provided for April, and otherwise we just get 3rd Friday of the month.

This is an urgent issue, as it makes the options data completely unusable in the current release!!!!

@aisthesis aisthesis changed the title options data retrieved only partially URGENT!!! options data garbled and unusable Apr 5, 2016
@aisthesis
Copy link
Author

Additional suggestion: Put in an integration test that verifies that all expiries are later than current date. That would pick up this error.

@aisthesis
Copy link
Author

The expiry_dates parameter of the Options object is correct:

>>> tsla.expiry_dates
[datetime.date(2016, 4, 8), datetime.date(2016, 4, 15), datetime.date(2016, 4, 22), datetime.date(2016, 4, 29), datetime.date(2016, 5, 20), datetime.date(2016, 6, 17), datetime.date(2016, 9, 16), datetime.date(2017, 1, 20), datetime.date(2018, 1, 19)]
>>> import datetime as dt
>>> data.loc[(slice(None), dt.datetime(2008, 4, 16)), :].iloc[0]
Last                                  0
Bid                                98.1
Ask                              101.85
...
>>> data.loc[(slice(None), dt.datetime(2016, 4, 8)), :].iloc[0]
Traceback (most recent call last):
...
KeyError: Timestamp('2016-04-08 00:00:00')

@aisthesis
Copy link
Author

I don't have this problem on a server running pandas 0.17.1 and pandas-datareader 0.2.1. So, this is actually a problem with pandas 0.18.0 and not with pandas-datareader 0.2.1. I'm running the latter on both machines.

@aisthesis
Copy link
Author

As I eventually discovered, the problem arises when upgrading python-dateutils from 2.4.2 to 2.5.2. I've also filed the issue with python-dateutils. @jreback is of the opinion that it needs to be fixed here to enforce correct conversion of the retrieved string into a date. I haven't looked at the details sufficiently to see what parameters need to be passed, but that makes sense to me.

@pganssle
Copy link

pganssle commented Apr 6, 2016

@aisthesis FYI, the launchpad is the old dateutil location. The new location is on github. I suspect this is due to dateutil/dateutil#233, which was fixed with dateutil/dateutil#234. The mitigation is to use python-dateutil==2.5.1. The only difference between 2.5.1 and 2.5.2 is fixing a bug with the dayfirst option, but it inadvertently introduced a new bug.

There will be a 2.5.3 release soon, I am anticipating a new tzdata release soon and I was hoping to get the two to coincide.

@femtotrader
Copy link
Contributor

An other project seems to have some problems with updated version of python-dateutil man-group/arctic#118

@pganssle
Copy link

pganssle commented Apr 6, 2016

Maybe I'm mis-understanding the problem then. In the linked PR, they were evidently counting on dayfirst having no effect on ambiguous dates, or misunderstanding what the dayfirst parameter is intended to do. There is a persistent change in that regard, since it was a bug.

The changed behavior is that when you have a date like this: "2011-01-05", that can be parsed either as YYYY-MM-DD or YYYY-DD-MM, and both of those are date formats that are actually in use, so the dayfirst parameter is used to specify that you want to prefer the YYYY-DD-MM format.

The issue I referred to above is for dates like 2011-01-15, which is completely unambiguous. In that case, the dayfirst and yearfirst parameters should be ignored, and it should parse to YYYY-MM-DD, but there is a bug that forces it to be interpreted as YYYY-DD-MM when dayfirst is specified, which raises an error.

Just remove the dayfirst parameter if you don't want to specify that the day is first.

@aisthesis
Copy link
Author

@pganssle python-dateutil==2.5.1 is still giving me garbled dates. Going back to python-dateutil==2.4.2 fixes it.

@pganssle
Copy link

pganssle commented Apr 6, 2016

Yes, I believe I misunderstood your issue. The issue I mentioned would be raising ValueError. If that's not what's happening, it's possible your application is relying on the behavior of a bug regarding yearfirst. I'm not particularly familiar with the internals of pandas or pandas-datareader, so I don't think I can be more help.

My understanding is that pandas has is already compatible with dateutil, though I guess it's in the forthcoming 0.18.1 release.

@davidastephens
Copy link
Member

I believe this is fixed with the new options parser in #244. Please reopen if not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants