Skip to content

PERF: Speed up DatetimeIndex field accessors with ZoneInfo timezones#64377

Open
shubhamgoel27 wants to merge 4 commits intopandas-dev:mainfrom
shubhamgoel27:perf-zoneinfo-utc-convert-64363
Open

PERF: Speed up DatetimeIndex field accessors with ZoneInfo timezones#64377
shubhamgoel27 wants to merge 4 commits intopandas-dev:mainfrom
shubhamgoel27:perf-zoneinfo-utc-convert-64363

Conversation

@shubhamgoel27
Copy link

@shubhamgoel27 shubhamgoel27 commented Mar 3, 2026

Summary

  • Routes ZoneInfo timezones through the fast C-level binary search path for UTC-to-local conversion, instead of the slow per-timestamp Python tz.utcoffset() API path
  • Reuses dateutil's existing transition data extraction via dateutil_gettz(tz.key) — no custom TZif parsing needed
  • Fixes tz_cache_key() to return tz.key for ZoneInfo, enabling DST transition data caching

Root cause

In Localizer.__cinit__ (tzconversion.pyx), ZoneInfo was grouped with tzlocal:

elif is_tzlocal(tz) or is_zoneinfo(tz):
    self.use_tzlocal = True

This forced every timestamp through _tz_localize_using_tzinfo_api(), which makes a Python-level tz.utcoffset() call per element — O(n) Python API calls instead of a single O(n) C-level loop with O(log k) binary search on cached transition arrays.

Fix

  1. tz_cache_key(): Return tz.key for ZoneInfo (was returning None, preventing caching)
  2. get_dst_info(): Add ZoneInfo branch that converts to the equivalent dateutil timezone via dateutil_gettz(tz.key) and reuses dateutil's transition data extraction (dateutil reads the same underlying TZif files)
  3. Localizer.__cinit__: Remove is_zoneinfo(tz) from use_tzlocal check so ZoneInfo flows through get_dst_info → binary search path

Benchmark

DatetimeIndex.month with 1M elements, tz=ZoneInfo('US/Eastern'):
  Before: ~3,000ms  (per-timestamp Python API calls)
  After:     ~29ms  (C-level binary search on cached transition arrays)
  Speedup: ~100x

Correctness verified against both dateutil and pytz across multiple timezones including DST spring-forward and fall-back transitions.

Closes #64363

Test plan

  • All existing timezone tests pass (6,574 tests across tslibs, datetime indexes, timestamps, extensions, frame/series tz methods)
  • Verified correctness for DST-aware (US/Eastern, Europe/London, Australia/Sydney), non-DST (Asia/Tokyo), and fixed-offset (Etc/GMT+5) ZoneInfo timezones
  • Verified DST transition edge cases (spring forward, fall back)
  • Results match both dateutil and pytz for .year, .month, .day, .hour accessors

🤖 Generated with Claude Code

Route ZoneInfo timezones through the fast C-level binary search path
for UTC-to-local conversion instead of the slow per-timestamp Python
tzinfo API path. This is done by:

1. Fixing tz_cache_key() to return tz.key for ZoneInfo, enabling
   DST transition data caching.
2. Adding a ZoneInfo branch in get_dst_info() that reuses dateutil's
   transition data via dateutil_gettz(tz.key), avoiding the need to
   re-implement TZif parsing.
3. Updating Localizer.__cinit__ to route ZoneInfo through the DST
   binary search path (use_dst=True) instead of the slow tzlocal
   path (use_tzlocal=True).

This provides ~100x speedup for DatetimeIndex.month, .year, .day,
.hour and other field accessors when using ZoneInfo timezones.

Closes pandas-dev#64363

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@shubhamgoel27 shubhamgoel27 force-pushed the perf-zoneinfo-utc-convert-64363 branch from c2a0729 to 34d8a0e Compare March 3, 2026 19:12
shubhamgoel27 and others added 2 commits March 3, 2026 12:04
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The tz_cache_key function was returning tz.key for ZoneInfo objects,
which could collide with pytz's tz.zone for the same IANA key (e.g.
both return "US/Pacific"). This caused get_dst_info to return
dateutil-derived transition data for pytz timezones when ZoneInfo was
cached first, leading to incorrect LMT offset handling.

Prefix ZoneInfo cache keys with "zoneinfo/" to match the namespacing
pattern used by dateutil ("dateutil" + filename).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@shubhamgoel27 shubhamgoel27 force-pushed the perf-zoneinfo-utc-convert-64363 branch from bb17170 to b91c65a Compare March 4, 2026 22:11
@jbrockmendel jbrockmendel added Performance Memory or execution speed performance Timezones Timezone data dtype labels Mar 6, 2026
@jbrockmendel
Copy link
Member

I suspect this would also close #58962

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Memory or execution speed performance Timezones Timezone data dtype

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PERF: datetime index getters functions are 10 times slower with ZoneInfo vs pytz timezone

2 participants