Improve performance of Collection.removeFirst(_:) where Self == SubSequence #32451

stephencelis · 2020-06-18T19:32:38Z

This PR updates Collection.removeFirst(_:) where Self == SubSequence to do a bounds check using index(_:offsetBy:limitedBy:) instead of count to avoid an O(N) complexity where N is the size of the collection (which is incorrectly documented as O(K) where K is the value passed to removeFirst).

A benchmark working on a large substring before the change:

name                                        time          std        iterations
-------------------------------------------------------------------------------
Substring.removeFirst()                          132.0 ns ± 274.18 %    1000000
Substring.removeFirst(1)                    93850715.0 ns ±   2.25 %         15
Substring.UnicodeScalarView.removeFirst()        109.0 ns ± 241.26 %    1000000
Substring.UnicodeScalarView.removeFirst(1)  18106959.0 ns ±   5.49 %         66
Substring.UTF8View.removeFirst()                 117.0 ns ± 313.89 %    1000000
Substring.UTF8View.removeFirst(1)                118.0 ns ± 287.76 %    1000000
Substring.UTF16View.removeFirst()                108.0 ns ± 323.84 %    1000000
Substring.UTF16View.removeFirst(1)               126.0 ns ± 330.52 %    1000000

After:

name                                       time     std        iterations
-------------------------------------------------------------------------
Substring.removeFirst()                    107.0 ns ± 407.56 %    1000000
Substring.removeFirst(1)                    90.0 ns ± 185.21 %    1000000
Substring.UnicodeScalarView.removeFirst()  102.0 ns ± 426.49 %    1000000
Substring.UnicodeScalarView.removeFirst(1)  95.0 ns ± 282.41 %    1000000
Substring.UTF8View.removeFirst()           105.0 ns ± 180.93 %    1000000
Substring.UTF8View.removeFirst(1)           89.0 ns ± 351.59 %    1000000
Substring.UTF16View.removeFirst()          105.0 ns ± 367.93 %    1000000
Substring.UTF16View.removeFirst(1)         102.0 ns ± 177.51 %    1000000

Context: https://forums.swift.org/t/performance-of-removefirst-vs-removefirst-1/37712

…quence

stdlib/public/core/Collection.swift

Co-authored-by: Xiaodi Wu <[email protected]>

xwu

Sorry, should have figured out how to highlight the subsequent lines so that they're not duplicated.

stdlib/public/core/Collection.swift

xwu · 2020-06-18T19:46:51Z

@swift-ci Please smoke test

xwu · 2020-06-18T20:02:00Z

@swift-ci Please smoke test

theblixguy · 2020-06-18T21:23:17Z

Windows failure is unrelated (needs #32454 & #32457 to unblock)

@swift-ci please smoke test Linux

xwu · 2020-06-18T22:33:32Z

I wonder, @stephencelis, would it be possible for you to incorporate this benchmark into the test suite in another PR? Then we can merge that first to show the nice shiny improvement with this PR while also ensuring no regressions on other benchmarks.

stephencelis · 2020-06-18T22:43:11Z

@xwu I'm not familiar with that process. Do you have any docs/examples handy?

theblixguy · 2020-06-18T22:44:21Z

Here's a doc you might find handy: https://github.com/apple/swift/blob/master/benchmark/README.md.

xwu · 2020-06-18T23:39:49Z

There are a few files in the single-source directory that seem to be good examples. For instance, ReduceInto.swift seems pretty straightforward and might be a good starting point.

natecook1000 · 2020-06-19T03:01:06Z

@stephencelis Thanks so much for identifying this issue and working on a fix! There are two more overloads in RangeReplaceableCollection.swift that could use this treatment, the second of which looks like the one that's actually getting used for Substring, since it's both range-replaceable and self-slicing. Could you look at those as well?

stephencelis · 2020-06-19T12:10:19Z

@natecook1000 Done!

@xwu Unfortunately I'm having issues getting things to run locally and I'm not sure I have a ton of free time to diagnose, so I'm afraid I won't be able to submit those benchmarks 😞

natecook1000 · 2020-06-19T17:39:34Z

@swift-ci Please smoke benchmark

swift-ci · 2020-06-19T18:44:37Z

Performance: -O

Improvement	OLD	NEW	DELTA	RATIO
FlattenListLoop	2462	1636	-33.5%	1.50x (?)
FlattenListFlatMap	6633	4430	-33.2%	1.50x (?)
ObjectiveCBridgeStubToNSDate2	620	550	-11.3%	1.13x (?)

Code size: -O

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
FlattenListFlatMap	6192	6822	+10.2%	0.91x (?)

Improvement	OLD	NEW	DELTA	RATIO
NSStringConversion.MutableCopy.Rebridge.UTF8	870	789	-9.3%	1.10x (?)

Code size: -Osize

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
String.data.LargeUnicode	124	137	+10.5%	0.91x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

natecook1000 · 2020-06-25T15:54:32Z

@swift-ci Please benchmark

natecook1000 · 2020-06-25T16:21:04Z

@swift-ci Please benchmark

swift-ci · 2020-06-25T17:03:31Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
FlattenListLoop	957	1388	+45.0%	0.69x (?)
IterateData	810	945	+16.7%	0.86x
ObjectiveCBridgeStubNSDateRefAccess	512	559	+9.2%	0.92x (?)

Code size: -O

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
IterateData	803	966	+20.3%	0.83x
DataSetCountSmall	87	95	+9.2%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
StringBuilderLong	1060	990	-6.6%	1.07x (?)

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

stephencelis · 2020-06-25T17:17:56Z

Strange that SubstringDropFirst1 doesn't show up. Any idea why?

natecook1000 · 2020-06-25T17:27:45Z

It's likely because I wrote that benchmark to test dropFirst instead of removeFirst… 🤦

natecook1000 · 2020-06-26T19:30:29Z

@swift-ci Please benchmark

swift-ci · 2020-06-26T19:56:14Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
Array2D	4352	4768	+9.6%	0.91x (?)
RandomShuffleLCG2	336	368	+9.5%	0.91x (?)

Improvement	OLD	NEW	DELTA	RATIO
SubstringRemoveFirst1	102	0	-100.0%	102001.00x
LessSubstringSubstring	29	21	-27.6%	1.38x
LessSubstringSubstringGenericComparable	29	21	-27.6%	1.38x
EqualSubstringSubstring	28	21	-25.0%	1.33x
EqualSubstringSubstringGenericEquatable	28	21	-25.0%	1.33x
EqualSubstringString	28	21	-25.0%	1.33x
EqualStringSubstring	29	22	-24.1%	1.32x
UTF8Decode_InitDecoding	168	143	-14.9%	1.17x
UTF8Decode_InitFromCustom_contiguous	167	143	-14.4%	1.17x
Set.isSuperset.Seq.Empty.Int	49	42	-14.3%	1.17x (?)
ArrayLiteral2	76	67	-11.8%	1.13x (?)
Set.subtracting.Empty.Box	9	8	-11.1%	1.12x
StringComparison_longSharedPrefix	373	334	-10.5%	1.12x (?)
Set.isSubset.Int.Empty	51	46	-9.8%	1.11x (?)
Set.isDisjoint.Seq.Int.Empty	52	47	-9.6%	1.11x (?)
StringToDataSmall	600	550	-8.3%	1.09x (?)
Set.isDisjoint.Seq.Box.Empty	89	82	-7.9%	1.09x (?)
CStringLongAscii	189	175	-7.4%	1.08x (?)
Set.isDisjoint.Int.Empty	55	51	-7.3%	1.08x (?)

Code size: -O

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
SubstringComparable	9	10	+11.1%	0.90x (?)
ObjectiveCBridgeStringHash	72	79	+9.7%	0.91x (?)
RemoveWhereFilterString	222	241	+8.6%	0.92x (?)
StringWalk	1520	1640	+7.9%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
SubstringRemoveFirst1	104	0	-100.0%	104001.00x
EqualSubstringSubstring	28	21	-25.0%	1.33x (?)
LessSubstringSubstring	28	21	-25.0%	1.33x
EqualStringSubstring	28	21	-25.0%	1.33x (?)
EqualSubstringSubstringGenericEquatable	28	21	-25.0%	1.33x
EqualSubstringString	28	21	-25.0%	1.33x
LessSubstringSubstringGenericComparable	28	21	-25.0%	1.33x
UTF8Decode_InitFromCustom_contiguous	164	140	-14.6%	1.17x
UTF8Decode_InitDecoding	163	143	-12.3%	1.14x
Set.isSuperset.Seq.Empty.Int	53	47	-11.3%	1.13x (?)
Set.isStrictSubset.Int.Empty	53	47	-11.3%	1.13x (?)
DataCreateEmptyArray	900	800	-11.1%	1.12x (?)
Set.isDisjoint.Seq.Int.Empty	51	46	-9.8%	1.11x (?)
Set.isSubset.Seq.Int.Empty	124	112	-9.7%	1.11x (?)
StringHasPrefixAscii	1080	980	-9.3%	1.10x
Set.isStrictSubset.Seq.Int.Empty	125	115	-8.0%	1.09x

Code size: -Osize

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStringHash	71	79	+11.3%	0.90x (?)
StringFromLongWholeSubstring	9	10	+11.1%	0.90x (?)
Data.append.Sequence.64kB.Count	2965	3195	+7.8%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
SubstringRemoveFirst1	100	0	-100.0%	100001.00x
LessSubstringSubstring	32	25	-21.9%	1.28x
EqualSubstringSubstringGenericEquatable	32	25	-21.9%	1.28x
LessSubstringSubstringGenericComparable	32	25	-21.9%	1.28x
EqualSubstringSubstring	33	26	-21.2%	1.27x
EqualStringSubstring	33	26	-21.2%	1.27x
EqualSubstringString	33	26	-21.2%	1.27x
UTF8Decode_InitDecoding	173	147	-15.0%	1.18x
UTF8Decode_InitFromCustom_contiguous	176	151	-14.2%	1.17x
StringWalk	3320	3080	-7.2%	1.08x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

natecook1000 · 2020-06-28T15:20:53Z

@swift-ci Please smoke test and merge

Improve performance of Collection.removeFirst(_:) where Self == SubSe…

099ecbe

…quence

theblixguy requested a review from natecook1000 June 18, 2020 19:39

xwu reviewed Jun 18, 2020

View reviewed changes

stdlib/public/core/Collection.swift Outdated Show resolved Hide resolved

stephencelis and others added 2 commits June 18, 2020 15:41

Update stdlib/public/core/Collection.swift

ce78396

Co-authored-by: Xiaodi Wu <[email protected]>

Update Collection.swift

0ce01d5

xwu reviewed Jun 18, 2020

View reviewed changes

stdlib/public/core/Collection.swift Outdated Show resolved Hide resolved

stdlib/public/core/Collection.swift Outdated Show resolved Hide resolved

Update Collection.swift

3b8d8b7

Others

e150b37

natecook1000 mentioned this pull request Jun 19, 2020

Add benchmarks for Substring.dropFirst(_:) and dropLast(_:) #32473

Merged

swift-ci merged commit e25934f into swiftlang:master Jun 28, 2020

natecook1000 mentioned this pull request Jun 29, 2020

[stdlib] Fix removeLast(_:) performance for non-random-access collections #32599

Merged

natecook1000 pushed a commit to natecook1000/swift that referenced this pull request Jul 1, 2020

Merge pull request swiftlang#32451 from stephencelis/patch-1

d31e012

natecook1000 mentioned this pull request Jul 1, 2020

[5.3] [stdlib] Performance fixes for removeFirst and removeLast #32647

Merged

jpsim mentioned this pull request Jul 2, 2020

[163] Issue #163 - July 2, 2020 SwiftWeekly/swiftweekly.github.io#528

Closed

Improve performance of Collection.removeFirst(_:) where Self == SubSequence #32451

Improve performance of Collection.removeFirst(_:) where Self == SubSequence #32451

Uh oh!

Conversation

stephencelis commented Jun 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

xwu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

xwu commented Jun 18, 2020

Uh oh!

xwu commented Jun 18, 2020

Uh oh!

theblixguy commented Jun 18, 2020

Uh oh!

xwu commented Jun 18, 2020

Uh oh!

stephencelis commented Jun 18, 2020

Uh oh!

theblixguy commented Jun 18, 2020

Uh oh!

xwu commented Jun 18, 2020

Uh oh!

natecook1000 commented Jun 19, 2020

Uh oh!

stephencelis commented Jun 19, 2020

Uh oh!

natecook1000 commented Jun 19, 2020

Uh oh!

swift-ci commented Jun 19, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

natecook1000 commented Jun 25, 2020

Uh oh!

natecook1000 commented Jun 25, 2020

Uh oh!

swift-ci commented Jun 25, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

stephencelis commented Jun 25, 2020

Uh oh!

natecook1000 commented Jun 25, 2020

Uh oh!

natecook1000 commented Jun 26, 2020

Uh oh!

swift-ci commented Jun 26, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

natecook1000 commented Jun 28, 2020

Uh oh!

Uh oh!

stephencelis commented Jun 18, 2020 •

edited

Loading