-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
timezone aware columns #9242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Your analysis is generally correct. The problem is, as you pointed out, that numpy does not have support for time zones, and the data in columns are stored as numpy arrays. The DatetimeIndex provides some work-arounds to handle time zones at the level of the full index, and these workaround are not (yet) available for columns ('blocks'). But, as far as I understand, it would be possible to do something similar for the DatetimeBlock, but @jreback can shed more light on this. I think it is a rather big enhancement, but if your are interested in working on this, certainly welcome! Or, for improving timezone support in numpy itself, they are certainly also looking for help. |
Thank your @jorisvandenbossche, very helpful. Handling it at the DatetimeBlock level seems a bit messier, e.g. different columns could have different timezone or precision information. I was thinking of a solution more like Categorical. As far as I can see Categorical seems to have all the right basic infrastructure in place to duplicate to create a DatetimeColumn, but naturally this is a very superficial viewpoint. Sure, I am interested in at least looking at what would be required. The timezone support on the numpy side I actually view as adequate, in the sense that I think the datetime64[ns] is perfectly adequate as a low-level data type, and has the enormous advantage of being just exactly a 64 bit integer and nothing else. This, I think, should be handled on the pandas side. |
dupe of #8260 @quicknir you are welcome to give this a go, its actually not that tricky, just inherit from
support on numpy is non-existant, though there are some proposals. just look thru the pandas codebase and you will appreciate the enormity of what @wesm did with timezones. |
At the moment, I don't see any way to have performant, timezone aware columns. This seems rather surprising, as generally when you have a column with timestamps, they will all likely be of the same timezone. Right now, as far as I can see you have two options:
Am I missing something? Is there a really good reason why a DatetimeIndex can't just be used, as is, in a column, without the dtype=object conversion?
The text was updated successfully, but these errors were encountered: