Skip to content

PERF: avoid materializing values[indexer] in Block.setitem#64251

Merged
jbrockmendel merged 10 commits into
pandas-dev:mainfrom
hyoj0942:perf-block-setitem-datetimelike-numset
Apr 10, 2026
Merged

PERF: avoid materializing values[indexer] in Block.setitem#64251
jbrockmendel merged 10 commits into
pandas-dev:mainfrom
hyoj0942:perf-block-setitem-datetimelike-numset

Conversation

@hyoj0942

@hyoj0942 hyoj0942 commented Feb 20, 2026

Copy link
Copy Markdown
Contributor

This avoids materializing values[indexer] in the object-dtype datetimelike compatibility path of Block.setitem.

Changes:

  • gate the optimization on non-scalar indexers
  • use length_of_indexer for common indexer types instead of materializing values[indexer]
  • keep the existing behavior for uncommon indexers by falling back to len(values[indexer])
  • add a whatsnew entry for the performance improvement

Validation run locally:

  • python -m pytest pandas/tests/indexing/test_iloc.py -k "test_iloc_setitem_custom_object"
  • python -m pre_commit run --files pandas/core/internals/blocks.py
  • python -m pre_commit run --files doc/source/whatsnew/v3.1.0.rst

@hyoj0942

Copy link
Copy Markdown
Contributor Author

ASV results for this change (same machine, same env):

asv run -e -E existing -b SetitemObjectDtypeDatetimelike --record-samples --dry-run

Note: for apples-to-apples comparison, baseline was measured by reverting only the optimization in blocks.py while keeping the benchmark class constant.

nrows frac bool baseline bool this PR bool speedup ndarray baseline ndarray this PR ndarray speedup
10000 0.01 14.4μs 14.4μs 1.000x 8.03μs 8.14μs 0.986x
10000 0.50 120μs 102μs 1.176x 117μs 99.8μs 1.172x
10000 0.99 213μs 188μs 1.133x 224μs 194μs 1.155x
1000000 0.01 957μs 715μs 1.338x 253μs 201μs 1.259x
1000000 0.50 15.2ms 11.7ms 1.299x 12.9ms 11.2ms 1.152x
1000000 0.99 18.8ms 17.1ms 1.099x 24.4ms 20.6ms 1.184x
10000000 0.01 15.9ms 10.6ms 1.500x 3.18ms 2.85ms 1.116x
10000000 0.50 151ms 115ms 1.313x 221ms 161ms 1.373x
10000000 0.99 177ms 166ms 1.066x 437ms 316ms 1.383x
20000000 0.01 43.2ms 25.5ms 1.694x 10.5ms 7.78ms 1.350x
20000000 0.50 302ms 233ms 1.296x 458ms 333ms 1.375x
20000000 0.99 371ms 335ms 1.107x 936ms 668ms 1.401x

Geometric mean speedup:

  • bool: ~1.24x
  • ndarray: ~1.24x
  • overall: ~1.24x

@rhshadrach rhshadrach left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Comment thread asv_bench/benchmarks/indexing.py Outdated
Comment thread pandas/core/internals/blocks.py Outdated
Comment thread pandas/tests/internals/test_internals.py Outdated
@hyoj0942

Copy link
Copy Markdown
Contributor Author

Pushed follow-up commit 5a09be1244 addressing review feedback:

  • reduced ASV parameterization to indexer_kind only
  • removed broad try/except fallback in _datetimelike_compat_num_set
  • removed newly added internals tests and rely on existing public API tests for coverage

@jbrockmendel jbrockmendel added the Performance Memory or execution speed performance label Mar 6, 2026
Comment thread pandas/core/internals/blocks.py
Comment thread pandas/tests/indexing/test_iloc.py Outdated
Comment thread asv_bench/benchmarks/indexing.py Outdated
@jbrockmendel

Copy link
Copy Markdown
Member

Can you add a whatsnew for the perf improvement? otherwise LGTM

@hyoj0942

Copy link
Copy Markdown
Contributor Author

Added a whatsnew entry in doc/source/whatsnew/v3.1.0.rst and updated the checklist in the PR description.

@hyoj0942 hyoj0942 requested a review from rhshadrach March 30, 2026 09:20
@jbrockmendel jbrockmendel merged commit 32b7892 into pandas-dev:main Apr 10, 2026
45 checks passed
@jbrockmendel

Copy link
Copy Markdown
Member

thanks @hyoj0942

@mroeschke mroeschke added this to the 3.1 milestone Apr 10, 2026
Sharl0tteIsTaken added a commit to Sharl0tteIsTaken/pandas that referenced this pull request Apr 12, 2026
…-comparison

* upstream/main:
  PERF: use lookup instead of hash_inner_join for merge with unique right keys (pandas-dev#64691)
  BUG : update `SeriesGroupBy.ohlc()` to honor `as_index=False` (pandas-dev#65141)
  PERF: Use DataFrame-level reductions in DataFrame.agg with list of funcs (pandas-dev#65031)
  DOC: document required external libraries in read_* I/O docstrings (pandas-dev#65143)
  DOC: improve MultiIndex.is_monotonic_increasing/decreasing docstrings (pandas-dev#65154)
  BUG: Raise ValueError for non-boolean numeric_only in DataFrame/Series reductions (GH#53098) (pandas-dev#65131)
  BUG: Timedelta.round() raises ZeroDivisionError when internal unit is 's' and target frequency is sub-second (pandas-dev#64836)
  ENH: Add replace method to Index (closes pandas-dev#19495) (pandas-dev#65099)
  PERF: improve StringArray.isna (pandas-dev#57733)
  BUG: read parquet files with older pytz (DEP: keep lower pytz minimum version) (pandas-dev#65133)
  DEPR: deprecate dates-with-datetime64 in _maybe_downcast_for_indexing (pandas-dev#64871)
  DOC: note that DataFrame.values is not writeable (pandas-dev#65142)
  CLN: Update groupby observed defaults (pandas-dev#65148)
  PERF: avoid materializing values[indexer] in Block.setitem (pandas-dev#64251)
  DOC: update GroupBy.sum/min/max See Also sections (pandas-dev#65144)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PERF: object-dtype iloc setitem with datetimelike list-like indexers is slow on large arrays

4 participants