[do not merge] Evaluate the hot/cold splitting pass #21016

vedantk · 2018-12-04T21:53:52Z

This PR is a sanity-check for hot/cold splitting in the swift compiler. It's not meant to be merged. The goal is to get a rough idea of the effectiveness of the pass by gathering some basic performance numbers.

Caveats:

Outlined cold code will not be relocated towards the end of the text segment, as we cannot test with a modified linker. Based on prior experiments we expect this to lower performance.
No swift-specific heuristics for marking cold basic blocks are being evaluated (that might be an interesting follow-up).

(cherry picked from commit a5e427732d08c35bc2a67d10f8d5140475a02e01)

vedantk · 2018-12-04T21:55:24Z

apple/swift-llvm#127
@swift-ci Please smoke benchmark

vedantk · 2018-12-04T21:58:01Z

@swift-ci Please clean smoke test OS X platform

vedantk · 2018-12-04T22:46:35Z

apple/swift-llvm#127
@swift-ci Please smoke test OS X platform

vedantk · 2018-12-04T22:54:51Z

There are some decent performance improvements in a few benchmarks (NopDeinit is 1.31x faster at -O) mixed with a few regressions (Walsh is 0.85x as fast). As mentioned in the PR description, using a modified linker which co-locates cold/outlined symbols should give a significant improvement here.

Hot/cold splitting seems to have a negative effect on code size, especially with integer-heavy benchmarks which (presumably) contain many outlinable traps. Tweaking the outlining code size threshold should improve the results. If we ever want this optimization in swift, we might consider disabling it in -Osize.

swift-ci · 2018-12-04T22:56:39Z

Build comment file:

Performance: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
Walsh	357	421	+17.9%	0.85x
IterateData	1398	1566	+12.0%	0.89x
Improvement
NopDeinit	59577	45620	-23.4%	1.31x
StringBuilderSmallReservingCapacity	387	361	-6.7%	1.07x
StringAdder	477	445	-6.7%	1.07x
StringBuilder	375	350	-6.7%	1.07x

Code size: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
SortIntPyramids.o	12661	17528	+38.4%	0.72x
SortLettersInPlace.o	8879	11818	+33.1%	0.75x
NibbleSort.o	12314	16389	+33.1%	0.75x
StaticArray.o	14045	18304	+30.3%	0.77x
RGBHistogram.o	27556	34120	+23.8%	0.81x
RangeAssignment.o	4940	6005	+21.6%	0.82x
Histogram.o	4187	5040	+20.4%	0.83x
WordCount.o	44244	52095	+17.7%	0.85x
StringEdits.o	12935	14809	+14.5%	0.87x
DriverUtils.o	153141	174024	+13.6%	0.88x
Walsh.o	9164	10373	+13.2%	0.88x
DictionaryLiteral.o	1360	1538	+13.1%	0.88x
RemoveWhere.o	26695	30089	+12.7%	0.89x
ArrayOfGenericRef.o	15036	16927	+12.6%	0.89x
SequenceAlgos.o	20731	23326	+12.5%	0.89x
SortStrings.o	27936	31147	+11.5%	0.90x
PopFrontGeneric.o	4734	5255	+11.0%	0.90x
ReversedCollections.o	11179	12391	+10.8%	0.90x
SortLargeExistentials.o	20694	22909	+10.7%	0.90x
StringRemoveDupes.o	7568	8378	+10.7%	0.90x
ArrayOfRef.o	12338	13653	+10.7%	0.90x
CSVParsing.o	31913	35300	+10.6%	0.90x
ClassArrayGetter.o	5639	6237	+10.6%	0.90x
Substring.o	18215	20077	+10.2%	0.91x
Phonebook.o	11660	12844	+10.2%	0.91x
HashQuadratic.o	5508	6061	+10.0%	0.91x
RandomShuffle.o	3691	4060	+10.0%	0.91x
TwoSum.o	5540	6093	+10.0%	0.91x
PopFront.o	5213	5726	+9.8%	0.91x
COWTree.o	13188	14431	+9.4%	0.91x
DictionaryRemove.o	17158	18728	+9.2%	0.92x
UTF8Decode.o	12378	13463	+8.8%	0.92x
StringInterpolation.o	7355	7991	+8.6%	0.92x
DictionaryKeysContains.o	11815	12824	+8.5%	0.92x
DictTest2.o	15589	16891	+8.4%	0.92x
DictionaryCompactMapValues.o	21038	22769	+8.2%	0.92x
StringMatch.o	4430	4792	+8.2%	0.92x
DataBenchmarks.o	55956	60295	+7.8%	0.93x
NopDeinit.o	5552	5967	+7.5%	0.93x
Suffix.o	26345	28260	+7.3%	0.93x
DictionaryCopy.o	8560	9177	+7.2%	0.93x
DictTest.o	19191	20474	+6.7%	0.94x
DictionarySwap.o	27687	29516	+6.6%	0.94x
DropLast.o	26451	28166	+6.5%	0.94x
StringBuilder.o	7338	7774	+5.9%	0.94x
DictionaryOfAnyHashableStrings.o	11101	11734	+5.7%	0.95x
LuhnAlgoLazy.o	10996	11599	+5.5%	0.95x
LuhnAlgoEager.o	10998	11601	+5.5%	0.95x
DictTest3.o	23877	25179	+5.5%	0.95x
Queue.o	14315	15094	+5.4%	0.95x
DictOfArraysToArrayOfDicts.o	30120	31705	+5.3%	0.95x
DictionaryGroup.o	17124	18019	+5.2%	0.95x
FlattenList.o	6312	6635	+5.1%	0.95x
Hash.o	39090	40972	+4.8%	0.95x
ExistentialPerformance.o	69131	72271	+4.5%	0.96x
ObjectiveCBridging.o	42872	44794	+4.5%	0.96x
DictTest4.o	25037	26121	+4.3%	0.96x
Radix2CooleyTukey.o	5070	5287	+4.3%	0.96x
RC4.o	4715	4908	+4.1%	0.96x
StringComparison.o	44294	46089	+4.1%	0.96x
DictTest4Legacy.o	26479	27547	+4.0%	0.96x
DictionarySubscriptDefault.o	29441	30621	+4.0%	0.96x
Ackermann.o	1852	1925	+3.9%	0.96x
RomanNumbers.o	5311	5497	+3.5%	0.97x
DictionaryBridgeToObjC.o	6165	6366	+3.3%	0.97x
SetTests.o	64400	66467	+3.2%	0.97x
ReduceInto.o	17929	18499	+3.2%	0.97x
ObjectiveCBridgingStubs.o	19315	19924	+3.2%	0.97x
RecursiveOwnedParameter.o	1382	1420	+2.7%	0.97x
Combos.o	7409	7604	+2.6%	0.97x
ArraySubscript.o	4028	4133	+2.6%	0.97x
CountAlgo.o	13373	13704	+2.5%	0.98x
Join.o	2288	2344	+2.4%	0.98x
DictionaryBridge.o	3374	3455	+2.4%	0.98x
ArrayAppend.o	39272	40211	+2.4%	0.98x
ObserverForwarderStruct.o	3594	3673	+2.2%	0.98x
CharacterProperties.o	19061	19457	+2.1%	0.98x
StringWalk.o	40666	41476	+2.0%	0.98x
MonteCarloE.o	3324	3389	+2.0%	0.98x
TestsUtils.o	23723	24150	+1.8%	0.98x
Prefix.o	24345	24780	+1.8%	0.98x
DropFirst.o	25044	25479	+1.7%	0.98x
ObserverClosure.o	3279	3335	+1.7%	0.98x
Array2D.o	4232	4304	+1.7%	0.98x
ObserverUnappliedMethod.o	5266	5351	+1.6%	0.98x
Hanoi.o	3601	3657	+1.6%	0.98x
Prims.o	42945	43600	+1.5%	0.98x
PrimsSplit.o	42997	43652	+1.5%	0.98x
ObserverPartiallyAppliedMethod.o	3567	3620	+1.5%	0.99x
BitCount.o	1876	1901	+1.3%	0.99x
RangeReplaceableCollectionPlusDefault.o	6317	6390	+1.2%	0.99x

Performance: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
IterateData	1357	1566	+15.4%	0.87x
CaptureProp	4286	4857	+13.3%	0.88x
PrefixAnySeqCntRangeLazy	159	176	+10.7%	0.90x
CharIteration_russian_unicodeScalars_Backwards	5152	5629	+9.3%	0.92x
DataCountSmall	34	37	+8.8%	0.92x
DataCountMedium	37	40	+8.1%	0.93x
Improvement
NopDeinit	57656	44840	-22.2%	1.29x
BitCount	190	171	-10.0%	1.11x
FlattenListLoop	4431	4063	-8.3%	1.09x (?)
Array2D	7505	6909	-7.9%	1.09x
SortAdjacentIntPyramids	1314	1219	-7.2%	1.08x (?)
StringBuilder	370	344	-7.0%	1.08x
MapReduce	433	404	-6.7%	1.07x
MapReduceAnyCollection	436	407	-6.7%	1.07x

Code size: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
StaticArray.o	13025	18104	+39.0%	0.72x
NibbleSort.o	14122	18789	+33.0%	0.75x
SortLettersInPlace.o	8862	11634	+31.3%	0.76x
RGBHistogram.o	27391	35059	+28.0%	0.78x
SortIntPyramids.o	12353	15736	+27.4%	0.79x
Histogram.o	4032	5008	+24.2%	0.81x
RangeAssignment.o	5101	6309	+23.7%	0.81x
Walsh.o	6156	7557	+22.8%	0.81x
SortStrings.o	28887	34643	+19.9%	0.83x
WordCount.o	40500	47679	+17.7%	0.85x
Phonebook.o	12132	14268	+17.6%	0.85x
Queue.o	13091	15270	+16.6%	0.86x
RandomShuffle.o	3767	4316	+14.6%	0.87x
ReversedCollections.o	11596	13237	+14.2%	0.88x
RemoveWhere.o	24438	27729	+13.5%	0.88x
HashQuadratic.o	5160	5853	+13.4%	0.88x
TwoSum.o	5373	6061	+12.8%	0.89x
DriverUtils.o	132709	148736	+12.1%	0.89x
DictionaryLiteral.o	1509	1690	+12.0%	0.89x
DictionaryRemove.o	15683	17536	+11.8%	0.89x
DictionaryGroup.o	16336	18235	+11.6%	0.90x
SequenceAlgos.o	21908	24447	+11.6%	0.90x
COWTree.o	13674	15253	+11.5%	0.90x
FlattenList.o	6696	7459	+11.4%	0.90x
ClassArrayGetter.o	5673	6301	+11.1%	0.90x
StringEdits.o	11982	13305	+11.0%	0.90x
PopFrontGeneric.o	4823	5351	+10.9%	0.90x
NopDeinit.o	6244	6927	+10.9%	0.90x
StringRemoveDupes.o	7577	8386	+10.7%	0.90x
ArrayOfRef.o	13162	14565	+10.7%	0.90x
SortLargeExistentials.o	21302	23565	+10.6%	0.90x
PopFront.o	5014	5542	+10.5%	0.90x
DictionaryKeysContains.o	11503	12712	+10.5%	0.90x
DictionaryCompactMapValues.o	19518	21569	+10.5%	0.90x
CaptureProp.o	1093	1200	+9.8%	0.91x
CSVParsing.o	31745	34828	+9.7%	0.91x
DictTest2.o	14466	15835	+9.5%	0.91x
UTF8Decode.o	11873	12959	+9.1%	0.92x
DictionarySwap.o	26871	29324	+9.1%	0.92x
Suffix.o	24577	26820	+9.1%	0.92x
DictionaryCopy.o	7945	8609	+8.4%	0.92x
StringMatch.o	4393	4760	+8.4%	0.92x
Substring.o	15906	17213	+8.2%	0.92x
StringBuilder.o	7206	7782	+8.0%	0.93x
DictTest.o	18034	19466	+7.9%	0.93x
DropLast.o	24331	26166	+7.5%	0.93x
DictionaryOfAnyHashableStrings.o	10757	11542	+7.3%	0.93x
RomanNumbers.o	5630	6009	+6.7%	0.94x
ReduceInto.o	13314	14203	+6.7%	0.94x
ArrayOfGenericRef.o	13636	14527	+6.5%	0.94x
Hash.o	20623	21964	+6.5%	0.94x
ExistentialPerformance.o	62683	66703	+6.4%	0.94x
ArrayOfPOD.o	2735	2910	+6.4%	0.94x
LazyFilter.o	8841	9404	+6.4%	0.94x
DictTest3.o	21794	23163	+6.3%	0.94x
DictionarySubscriptDefault.o	27513	29125	+5.9%	0.94x
RC4.o	3833	4052	+5.7%	0.95x
ChainedFilterMap.o	3204	3385	+5.6%	0.95x
DataBenchmarks.o	51029	53872	+5.6%	0.95x
CountAlgo.o	12944	13624	+5.3%	0.95x
DictTest4.o	20839	21897	+5.1%	0.95x
LuhnAlgoLazy.o	13940	14647	+5.1%	0.95x
LuhnAlgoEager.o	13942	14649	+5.1%	0.95x
SetTests.o	57712	60587	+5.0%	0.95x
DictTest4Legacy.o	23833	25011	+4.9%	0.95x
ObserverForwarderStruct.o	3838	4025	+4.9%	0.95x
DictOfArraysToArrayOfDicts.o	30608	32097	+4.9%	0.95x
MonteCarloE.o	3690	3869	+4.9%	0.95x
ObjectiveCBridging.o	40407	42290	+4.7%	0.96x
StringInterpolation.o	6690	6991	+4.5%	0.96x
Radix2CooleyTukey.o	4756	4951	+4.1%	0.96x
Prefix.o	22289	23188	+4.0%	0.96x
DropFirst.o	22412	23311	+4.0%	0.96x
RecursiveOwnedParameter.o	1313	1364	+3.9%	0.96x
ArrayAppend.o	32412	33607	+3.7%	0.96x
Ackermann.o	1957	2029	+3.7%	0.96x
ObjectiveCBridgingStubs.o	18411	19052	+3.5%	0.97x
ObserverUnappliedMethod.o	5571	5759	+3.4%	0.97x
ObserverClosure.o	3573	3688	+3.2%	0.97x
StringComparison.o	38926	40161	+3.2%	0.97x
ArraySubscript.o	3914	4037	+3.1%	0.97x
DictionaryBridgeToObjC.o	5997	6174	+3.0%	0.97x
Prims.o	39013	40120	+2.8%	0.97x
PrimsSplit.o	39065	40172	+2.8%	0.97x
RangeReplaceableCollectionPlusDefault.o	5757	5918	+2.8%	0.97x
ObserverPartiallyAppliedMethod.o	3927	4036	+2.8%	0.97x
Hanoi.o	3810	3913	+2.7%	0.97x
CharacterProperties.o	19317	19801	+2.5%	0.98x
StringWalk.o	34866	35732	+2.5%	0.98x
TestsUtils.o	18947	19398	+2.4%	0.98x
Combos.o	7809	7972	+2.1%	0.98x
Join.o	2565	2616	+2.0%	0.98x
PrefixWhile.o	21246	21609	+1.7%	0.98x
DictionaryBridge.o	3500	3559	+1.7%	0.98x
DropWhile.o	21932	22223	+1.3%	0.99x
MapReduce.o	26827	27174	+1.3%	0.99x
Array2D.o	4379	4432	+1.2%	0.99x
Fibonacci.o	1642	1661	+1.2%	0.99x
main.o	56785	57400	+1.1%	0.99x

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Regression
ArrayOfPOD	774	860	+11.1%	0.90x (?)
Improvement
RemoveWhereMoveStrings	3261	2881	-11.7%	1.13x
RemoveWhereMoveInts	2733	2477	-9.4%	1.10x
Memset	12597	11446	-9.1%	1.10x
XorLoop	8021	7304	-8.9%	1.10x

Code size: -swiftlibs

TEST	OLD	NEW	DELTA	RATIO
Regression
libswiftSIMDOperators.dylib	45056	49152	+9.1%	0.92x
libswiftAppKit.dylib	77824	81920	+5.3%	0.95x
libswiftSwiftOnoneSupport.dylib	163840	172032	+5.0%	0.95x
libswiftFoundation.dylib	1523712	1576960	+3.5%	0.97x
libswiftCore.dylib	3444736	3559424	+3.3%	0.97x
libswiftStdlibUnittest.dylib	380928	393216	+3.2%	0.97x
libswiftNetwork.dylib	159744	163840	+2.6%	0.98x

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

--------------

vedantk · 2018-12-04T23:12:00Z

apple/swift-llvm#127
@swift-ci Please smoke benchmark

vedantk · 2018-12-04T23:13:16Z

^ I've kicked off another smoke-benchmark run with the outlining threshold bumped up.

swift-ci · 2018-12-05T00:26:35Z

Build comment file:

Performance: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
Hanoi	3499	3936	+12.5%	0.89x
IterateData	1397	1552	+11.1%	0.90x
Improvement
NSStringConversion	866	592	-31.6%	1.46x
StringEqualPointerComparison	657	600	-8.7%	1.09x

Code size: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
StaticArray.o	14045	15156	+7.9%	0.93x
DataBenchmarks.o	55956	56956	+1.8%	0.98x
DictionaryKeysContains.o	11815	11999	+1.6%	0.98x

Performance: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
IterateData	1397	1668	+19.4%	0.84x
InsertCharacterEndIndex	155	167	+7.7%	0.93x
Improvement
ObjectiveCBridgeStubFromArrayOfNSString2	3815	3346	-12.3%	1.14x (?)
StringEqualPointerComparison	647	588	-9.1%	1.10x

Code size: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
StaticArray.o	13025	14860	+14.1%	0.88x
ReversedCollections.o	11596	11820	+1.9%	0.98x
DictionaryKeysContains.o	11503	11687	+1.6%	0.98x
RomanNumbers.o	5630	5695	+1.2%	0.99x

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Improvement
RemoveWhereMoveInts	2720	2312	-15.0%	1.18x
RemoveWhereMoveStrings	3257	2872	-11.8%	1.13x

Code size: -swiftlibs

TEST	OLD	NEW	DELTA	RATIO
Regression
libswiftSwiftOnoneSupport.dylib	163840	172032	+5.0%	0.95x
libswiftFoundation.dylib	1523712	1544192	+1.3%	0.99x

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

--------------

jckarter · 2019-01-15T22:23:12Z

@vedantk This is awesome! The code size hit might be worth it even at -Osize if it gives a good resident set win when paired with a cooperative linker. Part of the point of -Osize is to reduce memory usage by reducing code size, after all, and this more directly addresses that issue. Maybe there's a better way we could emit overflow traps to make them more splitting-friendly too.

vedantk · 2019-01-15T22:30:18Z

@jckarter thanks for taking a look! I haven't taken a close look yet at how Swift emits overflow traps so I'm not sure whether that would need to change.

I should point out that there are two more issues with the experiment done in this PR: 1) the splitting pass is scheduled after inlining, and 2) it doesn't look like SimplifyCFG has a chance to run afterwards and clean up some of the mess CodeExtractor leaves behind. I think it'd be worth repeating the experiment with the pipeline fixed to get more realistic numbers.

vedantk · 2019-01-15T22:30:40Z

Closing, as the sanity check I originally wanted is done.

[do not merge] Evaluate the hot/cold splitting pass

386c6a2

(cherry picked from commit a5e427732d08c35bc2a67d10f8d5140475a02e01)

vedantk requested a review from a team as a code owner December 4, 2018 21:53

vedantk closed this Jan 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[do not merge] Evaluate the hot/cold splitting pass #21016

[do not merge] Evaluate the hot/cold splitting pass #21016

vedantk commented Dec 4, 2018

vedantk commented Dec 4, 2018

vedantk commented Dec 4, 2018

vedantk commented Dec 4, 2018

vedantk commented Dec 4, 2018

swift-ci commented Dec 4, 2018

vedantk commented Dec 4, 2018

vedantk commented Dec 4, 2018

swift-ci commented Dec 5, 2018

jckarter commented Jan 15, 2019 •

edited

Loading

vedantk commented Jan 15, 2019

vedantk commented Jan 15, 2019

[do not merge] Evaluate the hot/cold splitting pass #21016

[do not merge] Evaluate the hot/cold splitting pass #21016

Conversation

vedantk commented Dec 4, 2018

vedantk commented Dec 4, 2018

vedantk commented Dec 4, 2018

vedantk commented Dec 4, 2018

vedantk commented Dec 4, 2018

swift-ci commented Dec 4, 2018

Build comment file:

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

vedantk commented Dec 4, 2018

vedantk commented Dec 4, 2018

swift-ci commented Dec 5, 2018

Build comment file:

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

jckarter commented Jan 15, 2019 • edited Loading

vedantk commented Jan 15, 2019

vedantk commented Jan 15, 2019

jckarter commented Jan 15, 2019 •

edited

Loading