Skip to content

timezone aware columns #9242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
quicknir opened this issue Jan 13, 2015 · 3 comments
Closed

timezone aware columns #9242

quicknir opened this issue Jan 13, 2015 · 3 comments
Labels
Duplicate Report Duplicate issue or pull request Internals Related to non-user accessible pandas implementation Timezones Timezone data dtype

Comments

@quicknir
Copy link

At the moment, I don't see any way to have performant, timezone aware columns. This seems rather surprising, as generally when you have a column with timestamps, they will all likely be of the same timezone. Right now, as far as I can see you have two options:

  1. Columns that are basically just numpy datetime64[ns] types. These seem to be timezone unaware.If you make such a column timezone aware (by e.g., dataframe.time_column.dt.tz_localize('UTC')) it becomes a column of dtype object.
  2. A DatetimeIndex, which keeps track of timezone information at seemingly the column level (which is laudable). However, this only seems to really work used as an index. If I assign it to a column, it again gets converted to a dtype object, and things get slow.

Am I missing something? Is there a really good reason why a DatetimeIndex can't just be used, as is, in a column, without the dtype=object conversion?

@jorisvandenbossche
Copy link
Member

Your analysis is generally correct. The problem is, as you pointed out, that numpy does not have support for time zones, and the data in columns are stored as numpy arrays.

The DatetimeIndex provides some work-arounds to handle time zones at the level of the full index, and these workaround are not (yet) available for columns ('blocks'). But, as far as I understand, it would be possible to do something similar for the DatetimeBlock, but @jreback can shed more light on this.

I think it is a rather big enhancement, but if your are interested in working on this, certainly welcome! Or, for improving timezone support in numpy itself, they are certainly also looking for help.

@quicknir
Copy link
Author

Thank your @jorisvandenbossche, very helpful. Handling it at the DatetimeBlock level seems a bit messier, e.g. different columns could have different timezone or precision information. I was thinking of a solution more like Categorical. As far as I can see Categorical seems to have all the right basic infrastructure in place to duplicate to create a DatetimeColumn, but naturally this is a very superficial viewpoint.

Sure, I am interested in at least looking at what would be required. The timezone support on the numpy side I actually view as adequate, in the sense that I think the datetime64[ns] is perfectly adequate as a low-level data type, and has the enormous advantage of being just exactly a 64 bit integer and nothing else. This, I think, should be handled on the pandas side.

@jreback
Copy link
Contributor

jreback commented Jan 13, 2015

dupe of #8260

@quicknir you are welcome to give this a go, its actually not that tricky, just inherit from DatetimeBlock. And the NonConsolidatingBlock mixin (so these blocks are not combined with one another). This is by far the cleanest soln.

Sure, I am interested in at least looking at what would be required. The timezone support on the numpy side I actually view as adequate, in the sense that I think the datetime64[ns] is perfectly adequate as a low-level data type, and has the enormous advantage of being just exactly a 64 bit integer and nothing else. This, I think, should be handled on the pandas side.

support on numpy is non-existant, though there are some proposals. just look thru the pandas codebase and you will appreciate the enormity of what @wesm did with timezones.

@jreback jreback closed this as completed Jan 13, 2015
@jreback jreback added Internals Related to non-user accessible pandas implementation Timezones Timezone data dtype labels Jan 13, 2015
@jorisvandenbossche jorisvandenbossche added the Duplicate Report Duplicate issue or pull request label Feb 27, 2015
@jorisvandenbossche jorisvandenbossche added this to the No action milestone Feb 27, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request Internals Related to non-user accessible pandas implementation Timezones Timezone data dtype
Projects
None yet
Development

No branches or pull requests

3 participants