Skip to content

Improve performance of Collection.removeFirst(_:) where Self == SubSequence #32451

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 28, 2020

Conversation

stephencelis
Copy link
Contributor

@stephencelis stephencelis commented Jun 18, 2020

This PR updates Collection.removeFirst(_:) where Self == SubSequence to do a bounds check using index(_:offsetBy:limitedBy:) instead of count to avoid an O(N) complexity where N is the size of the collection (which is incorrectly documented as O(K) where K is the value passed to removeFirst).

A benchmark working on a large substring before the change:

name                                        time          std        iterations
-------------------------------------------------------------------------------
Substring.removeFirst()                          132.0 ns ± 274.18 %    1000000
Substring.removeFirst(1)                    93850715.0 ns ±   2.25 %         15
Substring.UnicodeScalarView.removeFirst()        109.0 ns ± 241.26 %    1000000
Substring.UnicodeScalarView.removeFirst(1)  18106959.0 ns ±   5.49 %         66
Substring.UTF8View.removeFirst()                 117.0 ns ± 313.89 %    1000000
Substring.UTF8View.removeFirst(1)                118.0 ns ± 287.76 %    1000000
Substring.UTF16View.removeFirst()                108.0 ns ± 323.84 %    1000000
Substring.UTF16View.removeFirst(1)               126.0 ns ± 330.52 %    1000000

After:

name                                       time     std        iterations
-------------------------------------------------------------------------
Substring.removeFirst()                    107.0 ns ± 407.56 %    1000000
Substring.removeFirst(1)                    90.0 ns ± 185.21 %    1000000
Substring.UnicodeScalarView.removeFirst()  102.0 ns ± 426.49 %    1000000
Substring.UnicodeScalarView.removeFirst(1)  95.0 ns ± 282.41 %    1000000
Substring.UTF8View.removeFirst()           105.0 ns ± 180.93 %    1000000
Substring.UTF8View.removeFirst(1)           89.0 ns ± 351.59 %    1000000
Substring.UTF16View.removeFirst()          105.0 ns ± 367.93 %    1000000
Substring.UTF16View.removeFirst(1)         102.0 ns ± 177.51 %    1000000

Context: https://forums.swift.org/t/performance-of-removefirst-vs-removefirst-1/37712

@theblixguy theblixguy requested a review from natecook1000 June 18, 2020 19:39
Copy link
Collaborator

@xwu xwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, should have figured out how to highlight the subsequent lines so that they're not duplicated.

@xwu
Copy link
Collaborator

xwu commented Jun 18, 2020

@swift-ci Please smoke test

1 similar comment
@xwu
Copy link
Collaborator

xwu commented Jun 18, 2020

@swift-ci Please smoke test

@theblixguy
Copy link
Collaborator

Windows failure is unrelated (needs #32454 & #32457 to unblock)

@swift-ci please smoke test Linux

@xwu
Copy link
Collaborator

xwu commented Jun 18, 2020

I wonder, @stephencelis, would it be possible for you to incorporate this benchmark into the test suite in another PR? Then we can merge that first to show the nice shiny improvement with this PR while also ensuring no regressions on other benchmarks.

@stephencelis
Copy link
Contributor Author

@xwu I'm not familiar with that process. Do you have any docs/examples handy?

@theblixguy
Copy link
Collaborator

@xwu
Copy link
Collaborator

xwu commented Jun 18, 2020

There are a few files in the single-source directory that seem to be good examples. For instance, ReduceInto.swift seems pretty straightforward and might be a good starting point.

@natecook1000
Copy link
Member

@stephencelis Thanks so much for identifying this issue and working on a fix! There are two more overloads in RangeReplaceableCollection.swift that could use this treatment, the second of which looks like the one that's actually getting used for Substring, since it's both range-replaceable and self-slicing. Could you look at those as well?

@stephencelis
Copy link
Contributor Author

@natecook1000 Done!

@xwu Unfortunately I'm having issues getting things to run locally and I'm not sure I have a ton of free time to diagnose, so I'm afraid I won't be able to submit those benchmarks 😞

@natecook1000
Copy link
Member

@swift-ci Please smoke benchmark

@swift-ci
Copy link
Contributor

Performance: -O

Improvement OLD NEW DELTA RATIO
FlattenListLoop 2462 1636 -33.5% 1.50x (?)
FlattenListFlatMap 6633 4430 -33.2% 1.50x (?)
ObjectiveCBridgeStubToNSDate2 620 550 -11.3% 1.13x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
FlattenListFlatMap 6192 6822 +10.2% 0.91x (?)
 
Improvement OLD NEW DELTA RATIO
NSStringConversion.MutableCopy.Rebridge.UTF8 870 789 -9.3% 1.10x (?)

Code size: -Osize

Performance: -Onone

Regression OLD NEW DELTA RATIO
String.data.LargeUnicode 124 137 +10.5% 0.91x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@natecook1000
Copy link
Member

@swift-ci Please benchmark

1 similar comment
@natecook1000
Copy link
Member

@swift-ci Please benchmark

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
FlattenListLoop 957 1388 +45.0% 0.69x (?)
IterateData 810 945 +16.7% 0.86x
ObjectiveCBridgeStubNSDateRefAccess 512 559 +9.2% 0.92x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
IterateData 803 966 +20.3% 0.83x
DataSetCountSmall 87 95 +9.2% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
StringBuilderLong 1060 990 -6.6% 1.07x (?)

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

@stephencelis
Copy link
Contributor Author

Strange that SubstringDropFirst1 doesn't show up. Any idea why?

@natecook1000
Copy link
Member

It's likely because I wrote that benchmark to test dropFirst instead of removeFirst… 🤦

@natecook1000
Copy link
Member

@swift-ci Please benchmark

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
Array2D 4352 4768 +9.6% 0.91x (?)
RandomShuffleLCG2 336 368 +9.5% 0.91x (?)
 
Improvement OLD NEW DELTA RATIO
SubstringRemoveFirst1 102 0 -100.0% 102001.00x
LessSubstringSubstring 29 21 -27.6% 1.38x
LessSubstringSubstringGenericComparable 29 21 -27.6% 1.38x
EqualSubstringSubstring 28 21 -25.0% 1.33x
EqualSubstringSubstringGenericEquatable 28 21 -25.0% 1.33x
EqualSubstringString 28 21 -25.0% 1.33x
EqualStringSubstring 29 22 -24.1% 1.32x
UTF8Decode_InitDecoding 168 143 -14.9% 1.17x
UTF8Decode_InitFromCustom_contiguous 167 143 -14.4% 1.17x
Set.isSuperset.Seq.Empty.Int 49 42 -14.3% 1.17x (?)
ArrayLiteral2 76 67 -11.8% 1.13x (?)
Set.subtracting.Empty.Box 9 8 -11.1% 1.12x
StringComparison_longSharedPrefix 373 334 -10.5% 1.12x (?)
Set.isSubset.Int.Empty 51 46 -9.8% 1.11x (?)
Set.isDisjoint.Seq.Int.Empty 52 47 -9.6% 1.11x (?)
StringToDataSmall 600 550 -8.3% 1.09x (?)
Set.isDisjoint.Seq.Box.Empty 89 82 -7.9% 1.09x (?)
CStringLongAscii 189 175 -7.4% 1.08x (?)
Set.isDisjoint.Int.Empty 55 51 -7.3% 1.08x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
SubstringComparable 9 10 +11.1% 0.90x (?)
ObjectiveCBridgeStringHash 72 79 +9.7% 0.91x (?)
RemoveWhereFilterString 222 241 +8.6% 0.92x (?)
StringWalk 1520 1640 +7.9% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
SubstringRemoveFirst1 104 0 -100.0% 104001.00x
EqualSubstringSubstring 28 21 -25.0% 1.33x (?)
LessSubstringSubstring 28 21 -25.0% 1.33x
EqualStringSubstring 28 21 -25.0% 1.33x (?)
EqualSubstringSubstringGenericEquatable 28 21 -25.0% 1.33x
EqualSubstringString 28 21 -25.0% 1.33x
LessSubstringSubstringGenericComparable 28 21 -25.0% 1.33x
UTF8Decode_InitFromCustom_contiguous 164 140 -14.6% 1.17x
UTF8Decode_InitDecoding 163 143 -12.3% 1.14x
Set.isSuperset.Seq.Empty.Int 53 47 -11.3% 1.13x (?)
Set.isStrictSubset.Int.Empty 53 47 -11.3% 1.13x (?)
DataCreateEmptyArray 900 800 -11.1% 1.12x (?)
Set.isDisjoint.Seq.Int.Empty 51 46 -9.8% 1.11x (?)
Set.isSubset.Seq.Int.Empty 124 112 -9.7% 1.11x (?)
StringHasPrefixAscii 1080 980 -9.3% 1.10x
Set.isStrictSubset.Seq.Int.Empty 125 115 -8.0% 1.09x

Code size: -Osize

Performance: -Onone

Regression OLD NEW DELTA RATIO
ObjectiveCBridgeStringHash 71 79 +11.3% 0.90x (?)
StringFromLongWholeSubstring 9 10 +11.1% 0.90x (?)
Data.append.Sequence.64kB.Count 2965 3195 +7.8% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
SubstringRemoveFirst1 100 0 -100.0% 100001.00x
LessSubstringSubstring 32 25 -21.9% 1.28x
EqualSubstringSubstringGenericEquatable 32 25 -21.9% 1.28x
LessSubstringSubstringGenericComparable 32 25 -21.9% 1.28x
EqualSubstringSubstring 33 26 -21.2% 1.27x
EqualStringSubstring 33 26 -21.2% 1.27x
EqualSubstringString 33 26 -21.2% 1.27x
UTF8Decode_InitDecoding 173 147 -15.0% 1.18x
UTF8Decode_InitFromCustom_contiguous 176 151 -14.2% 1.17x
StringWalk 3320 3080 -7.2% 1.08x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

@natecook1000
Copy link
Member

@swift-ci Please smoke test and merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants