You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I get timeseries data through an API and I use Pandas to parse the CSV input into a dataframe.
I wouldn't mind users sending mixed timezone as long as datetimes are TZ-aware. However I don't want TZ-naive datetimes to be silently interpreted as UTC. Because in practice, the very likely case I already faced is users sending local time data as naive datetimes.
I wish there was an option in to_datetime to raise on naive datetimes.
Currently, I'm doing something like this:
defto_utc_index(index):
"""Create UTC datetime index from timezone aware datetime list"""# First parse without forcing UTCtry:
# Cast to series so that output is a series with an "apply" method we use during next stepindex=pd.to_datetime(pd.Series(index), format="ISO8601")
except (
ValueError,
pd.errors.OutOfBoundsDatetime,
pd._libs.tslibs.parsing.DateParseError,
) asexc:
raiseTimeseriesDataIODatetimeError("Invalid timestamp") fromexc# Then convert to UTC manually# We can't just use tz_convert because it would also silently treat naive datetimes as UTCtry:
index=index.apply(lambdax: x.astimezone(dt.timezone.utc))
exceptTypeErrorasexc:
raiseTimeseriesDataIODatetimeError(
"Invalid or TZ-naive timestamp"
) fromexcreturnpd.DatetimeIndex(index, name="timestamp")
data_df.index=to_utc_index(data_df.index)
This might not be efficient, but it's been working for a while.
Now to_datetime complains about mixed timezones and advocates the use of utc=True.
The problem with using utc=True is that the conversion will consider naive datetime as UTC. After the conversion, it is too late to check timezones. And before the conversion, the data is still a string, so checking the timezone requires either dealing with strings (use a regex, or perform checks that are format dependent) or instantiating datetimes and check TZ on datetimes, which is a pity performance-wise, and a step I'd be happy to avoid.
So currently, I'm left with the alternative of keeping current code and silencing the warning, which is not satisfying.
Having an option to raise on naive datetimes would help me get rid of most of the code I pasted above.
I didn't inspect the code to evaluate if it would be possible to achieve this in to_datetime in a more efficient way than the check I'm performing on each value, but even if it was not more efficient, it would already be an improvement not having to do it in user code.
Feature Description
This could be achieved by adding a raise_if_naive (better names welcome) argument.
Or perhaps an awareness argument (either None, "naive" or "aware") that would let the user specify what is expected in their input, None to disable the feature. This would allow users to accept only naive datetimes, which they may want to combine with utc=False to get a naive index.
Alternative Solutions
I could use a validation layer in front of pandas. In fact I used to validate input data with marshmallow but it is not efficient, so I dropped that part and now I just catch errors in pandas when building the dataframe.
It might be a bit specific, and I understand a lot of users don't care (users working on their own data, not with user-supplied data). But I still think this is relevant.
I'm sorry I didn't get the time to look into an implementation myself. Before anyone does, I think it would be nice to have a feedback about the idea.
Am I missing any other way to achieve what I'm trying to?
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
I get timeseries data through an API and I use Pandas to parse the CSV input into a dataframe.
I wouldn't mind users sending mixed timezone as long as datetimes are TZ-aware. However I don't want TZ-naive datetimes to be silently interpreted as UTC. Because in practice, the very likely case I already faced is users sending local time data as naive datetimes.
I wish there was an option in
to_datetime
to raise on naive datetimes.Currently, I'm doing something like this:
This might not be efficient, but it's been working for a while.
Now
to_datetime
complains about mixed timezones and advocates the use ofutc=True
.The problem with using
utc=True
is that the conversion will consider naive datetime as UTC. After the conversion, it is too late to check timezones. And before the conversion, the data is still a string, so checking the timezone requires either dealing with strings (use a regex, or perform checks that are format dependent) or instantiating datetimes and check TZ on datetimes, which is a pity performance-wise, and a step I'd be happy to avoid.So currently, I'm left with the alternative of keeping current code and silencing the warning, which is not satisfying.
Having an option to raise on naive datetimes would help me get rid of most of the code I pasted above.
I didn't inspect the code to evaluate if it would be possible to achieve this in
to_datetime
in a more efficient way than the check I'm performing on each value, but even if it was not more efficient, it would already be an improvement not having to do it in user code.Feature Description
This could be achieved by adding a
raise_if_naive
(better names welcome) argument.Or perhaps an
awareness
argument (eitherNone
,"naive"
or"aware"
) that would let the user specify what is expected in their input,None
to disable the feature. This would allow users to accept only naive datetimes, which they may want to combine withutc=False
to get a naive index.Alternative Solutions
I could use a validation layer in front of pandas. In fact I used to validate input data with marshmallow but it is not efficient, so I dropped that part and now I just catch errors in pandas when building the dataframe.
Additional Context
Mixed TZ warning was added in #50887.
The text was updated successfully, but these errors were encountered: