Skip to content

PERF: datetime index getters functions are 10 times slower with ZoneInfo vs pytz timezone#64379

Open
kjmin622 wants to merge 8 commits intopandas-dev:mainfrom
kjmin622:issue64363
Open

PERF: datetime index getters functions are 10 times slower with ZoneInfo vs pytz timezone#64379
kjmin622 wants to merge 8 commits intopandas-dev:mainfrom
kjmin622:issue64363

Conversation

@kjmin622
Copy link
Contributor

@kjmin622 kjmin622 commented Mar 3, 2026

@kjmin622 kjmin622 marked this pull request as draft March 3, 2026 08:26
self.deltas = deltas

if typ != "pytz" and typ != "dateutil":
if typ not in ("pytz", "dateutil", "zoneinfo"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: is this going to create an unnecessary python tuple object?

# Daylight Savings


cdef object _get_zoneinfo_trans_and_deltas(tzinfo tz):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we re-use some of this in the treat_tz_as_dateutil path?

@jbrockmendel
Copy link
Member

Couple of test failures to figure out, but this looks like the right approach.

@kjmin622 kjmin622 marked this pull request as ready for review March 4, 2026 08:38
@kjmin622
Copy link
Contributor Author

kjmin622 commented Mar 4, 2026

@jbrockmendel Thank you for your review! I've applied your feedback.
The tests failed because ZoneInfo and dateutil interpret offsets differently before the first transition (LMT era). For example, Africa/Lusaka shows a 2-second difference between the two libraries.
To address this, I modified the logic to use the original approach before the first transition, and the optimized approach after. (I've tried several approaches, but it seems impossible to fully resolve the performance issue for historical dates (before ~1900) where discrepancies occur between ZoneInfo and dateutil.)

kjmin622 and others added 2 commits March 4, 2026 21:07
Co-authored-by: mv-python <matusvalo@users.noreply.github.com>
Co-authored-by: mv-python <matusvalo@users.noreply.github.com>
@jbrockmendel
Copy link
Member

ZoneInfo and dateutil interpret offsets differently before the first transition (LMT era)

This surprises me. Is there a minimal example?

dateutil_tz : tzinfo
A dateutil timezone object with _trans_list and _trans_idx attributes.
first_offset_seconds : int64_t
The UTC offset in seconds for the period before the first transition.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed to address the dateutil-vs-zoneinfo discrepancy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is just extracting common logic into a single function. The discrepancy is addressed by the fallback logic in tzconversion.pyx (the part you asked me to add comments to).

return utc_val + self.delta
else:
pos[0] = bisect_right_i8(self.tdata, utc_val, self.ntrans) - 1
if self.use_zoneinfo_fallback and pos[0] == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add comments explaining why this is necessary

@kjmin622
Copy link
Contributor Author

kjmin622 commented Mar 5, 2026

ZoneInfo and dateutil interpret offsets differently before the first transition (LMT era)

This surprises me. Is there a minimal example?

@jbrockmendel
Here's a minimal example from my environment:

from zoneinfo import ZoneInfo
from dateutil.tz import gettz
from datetime import datetime

dt = datetime(1900, 1, 1)

# Africa/Lusaka: 2-second difference
print(ZoneInfo("Africa/Lusaka").utcoffset(dt))  # -> 2:10:18
print(gettz("Africa/Lusaka").utcoffset(dt))     # -> 2:10:20

# Africa/Sao_Tome: much larger difference
print(ZoneInfo("Africa/Sao_Tome").utcoffset(dt))  # -> -1 day, 23:23:15
print(gettz("Africa/Sao_Tome").utcoffset(dt))     # -> 0:26:56

Note that this discrepancy may not occur in all environments. Looking at the CI failure results, macOS passes while Ubuntu and Windows fail.
The tests pass when using the original logic (ZoneInfo.utcoffset()) for timestamps before the first transition.

@jbrockmendel
Copy link
Member

@pganssle thoughts here on how to explain/handle the discrepancy?

@jbrockmendel jbrockmendel added Performance Memory or execution speed performance Timezones Timezone data dtype labels Mar 6, 2026
@jbrockmendel
Copy link
Member

I suspect this would also close #58962

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Memory or execution speed performance Timezones Timezone data dtype

Projects

None yet

3 participants