Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

String.StartsWith performance - OrdinalCompareSubstring #2825

Merged
merged 1 commit into from
Jan 26, 2016
Merged

String.StartsWith performance - OrdinalCompareSubstring #2825

merged 1 commit into from
Jan 26, 2016

Conversation

bbowyersmyth
Copy link

Alternative to #2667

Improves the performance of an ordinal based StartsWith comparison. The existing code has the overhead of figuring out the exact index that differs along with redundant argument validation.

OrdinalCompareSubstring is a variation of EqualsHelper which is aware when it has reached the end of the string and can control the result of reading the null terminator (or next character after the provided length). The bitwise OR provided the best performance while avoiding having a branch or a result dependency.

The new function does not replace EqualsHelper as it has enough of a different performance profile than I was comfortable applying to String.Equals. One of those is the goto's as suggested by @jkotas, they perform better if the branch is not taken but worse than an inline return if they are taken. However with the StartsWith call all string lengths are faster so it seems like a good trade off to leave them in.

Results for when strings match:
(Note: These times include the full StartsWith code path)

Method Len AvrTime StdDev op/s
ClrStartsWith 1 11.4500 ns 0.5563 ns 87,470,557.18
NewStartsWith 1 6.3501 ns 0.1414 ns 157,528,084.28
ClrStartsWith 2 11.4065 ns 0.0603 ns 87,670,904.06
NewStartsWith 2 8.5795 ns 0.0200 ns 116,557,876.54
ClrStartsWith 3 12.2927 ns 0.1062 ns 81,352,873.14
NewStartsWith 3 9.1690 ns 0.4990 ns 109,285,431.49
ClrStartsWith 4 13.2371 ns 0.3956 ns 75,589,313.28
NewStartsWith 4 9.1914 ns 0.0640 ns 108,800,916.92
ClrStartsWith 5 13.9384 ns 0.0406 ns 71,744,795.08
NewStartsWith 5 9.3449 ns 0.0755 ns 107,014,566.89
ClrStartsWith 6 12.2244 ns 0.0416 ns 81,804,531.80
NewStartsWith 6 10.5490 ns 0.0625 ns 94,797,830.26
ClrStartsWith 7 13.0502 ns 0.0408 ns 76,627,530.52
NewStartsWith 7 11.5211 ns 0.9869 ns 87,216,552.58
ClrStartsWith 8 13.8708 ns 0.0710 ns 72,095,355.70
NewStartsWith 8 11.4602 ns 0.1205 ns 87,264,586.34
ClrStartsWith 9 14.6428 ns 0.0146 ns 68,292,867.94
NewStartsWith 9 11.3612 ns 0.0964 ns 88,023,104.15
ClrStartsWith 10 13.4256 ns 0.5080 ns 74,554,428.16
NewStartsWith 10 12.4931 ns 0.0115 ns 80,044,045.86
ClrStartsWith 15 15.3432 ns 0.0472 ns 65,175,852.27
NewStartsWith 15 10.0709 ns 0.1339 ns 99,307,847.30
ClrStartsWith 16 16.4699 ns 0.2721 ns 60,728,076.89
NewStartsWith 16 10.8500 ns 0.0934 ns 92,170,333.14
ClrStartsWith 17 17.2434 ns 0.2048 ns 57,998,490.10
NewStartsWith 17 10.7882 ns 0.0498 ns 92,695,078.11
ClrStartsWith 23 18.5733 ns 0.6475 ns 53,884,111.58
NewStartsWith 23 13.5250 ns 0.0982 ns 73,939,859.16
ClrStartsWith 24 19.9289 ns 0.1498 ns 50,180,204.82
NewStartsWith 24 11.0373 ns 0.4187 ns 90,686,846.76
ClrStartsWith 25 20.3670 ns 0.0665 ns 49,099,487.17
NewStartsWith 25 17.9949 ns 0.1059 ns 55,572,612.28
ClrStartsWith 31 20.5799 ns 0.4610 ns 48,607,483.32
NewStartsWith 31 13.2028 ns 0.3117 ns 75,769,599.22
ClrStartsWith 32 22.7187 ns 1.3742 ns 44,128,031.33
NewStartsWith 32 14.0159 ns 0.1796 ns 71,355,220.97
ClrStartsWith 33 22.7469 ns 0.7618 ns 43,995,277.95
NewStartsWith 33 14.0615 ns 0.3022 ns 71,137,849.71
ClrStartsWith 39 22.9895 ns 0.5827 ns 43,516,504.16
NewStartsWith 39 12.4601 ns 0.0629 ns 80,257,497.57
ClrStartsWith 40 23.1409 ns 0.0703 ns 43,213,820.21
NewStartsWith 40 13.9697 ns 0.3000 ns 71,605,570.42
ClrStartsWith 41 23.7431 ns 0.0572 ns 42,117,603.72
NewStartsWith 41 13.5549 ns 0.1095 ns 73,777,036.09
ClrStartsWith 47 23.7786 ns 0.0996 ns 42,055,055.00
NewStartsWith 47 16.9212 ns 0.1138 ns 59,099,298.06
ClrStartsWith 48 24.5809 ns 0.0047 ns 40,681,920.61
NewStartsWith 48 18.3364 ns 0.0462 ns 54,536,444.32
ClrStartsWith 49 25.8340 ns 0.1865 ns 38,709,989.78
NewStartsWith 49 14.4705 ns 0.9587 ns 69,314,948.96
ClrStartsWith 55 26.7904 ns 0.1677 ns 37,327,809.43
NewStartsWith 55 15.8341 ns 0.1374 ns 63,157,981.10
ClrStartsWith 56 28.2237 ns 0.2463 ns 35,433,003.40
NewStartsWith 56 16.9207 ns 0.2360 ns 59,106,749.89
ClrStartsWith 57 28.6957 ns 0.6784 ns 34,861,620.31
NewStartsWith 57 17.0855 ns 0.2301 ns 58,536,124.19
ClrStartsWith 63 29.6037 ns 0.3151 ns 33,782,080.83
NewStartsWith 63 15.4634 ns 0.4325 ns 64,702,660.34
ClrStartsWith 64 31.9466 ns 1.0954 ns 31,327,290.51
NewStartsWith 64 16.5770 ns 0.7782 ns 60,411,142.27
ClrStartsWith 65 32.1430 ns 0.2698 ns 31,112,412.96
NewStartsWith 65 16.3406 ns 0.0405 ns 61,197,663.86
ClrStartsWith 95 34.3080 ns 0.9018 ns 29,160,999.91
NewStartsWith 95 19.5439 ns 0.5334 ns 51,191,994.59
ClrStartsWith 96 34.5307 ns 0.1197 ns 28,959,961.14
NewStartsWith 96 16.8685 ns 0.0815 ns 59,282,994.49
ClrStartsWith 97 35.8156 ns 0.3923 ns 27,923,043.31
NewStartsWith 97 16.8585 ns 0.0402 ns 59,317,339.50
ClrStartsWith 100 35.7483 ns 0.9179 ns 27,985,525.16
NewStartsWith 100 17.9313 ns 0.2584 ns 55,776,124.71
ClrStartsWith 127 40.7176 ns 0.3486 ns 24,560,578.67
NewStartsWith 127 23.1166 ns 1.5395 ns 43,386,935.30
ClrStartsWith 128 41.5086 ns 0.0515 ns 24,091,437.72
NewStartsWith 128 21.8633 ns 0.0536 ns 45,738,973.34
ClrStartsWith 129 43.0288 ns 1.4601 ns 23,257,760.55
NewStartsWith 129 21.8181 ns 0.0395 ns 45,833,706.88
ClrStartsWith 255 73.8711 ns 0.4582 ns 13,537,443.88
NewStartsWith 255 32.0884 ns 0.0992 ns 31,164,128.69
ClrStartsWith 256 74.3100 ns 0.5996 ns 13,457,728.36
NewStartsWith 256 32.7835 ns 0.2055 ns 30,503,944.85
ClrStartsWith 257 76.5002 ns 1.6331 ns 13,075,804.91
NewStartsWith 257 32.7005 ns 0.1149 ns 30,580,797.38
ClrStartsWith 511 128.5744 ns 0.1405 ns 7,777,602.65
NewStartsWith 511 72.3023 ns 0.7706 ns 13,831,869.26
ClrStartsWith 512 129.3073 ns 0.6877 ns 7,733,659.29
NewStartsWith 512 73.2512 ns 0.7754 ns 13,652,667.92

cc @jkotas @benaadams @justinvp

@bbowyersmyth
Copy link
Author

For the curious here is how OrdinalCompareSubstring performs in place of EqualsHelper within String.Equals(string, string).

Two 15 character strings that differ at index n:

Method n AvrTime StdDev op/s
EqualsHelper 0 3.5466 ns 0.1022 ns 282,112,499.76
OrdinalCompareSubstring 0 3.9923 ns 0.0174 ns 250,487,815.52
EqualsHelper 1 3.6788 ns 0.0119 ns 271,827,237.50
OrdinalCompareSubstring 1 4.0442 ns 0.1041 ns 247,374,159.09
EqualsHelper 2 3.7353 ns 0.0652 ns 267,772,805.88
OrdinalCompareSubstring 2 3.9615 ns 0.0028 ns 252,430,975.66
EqualsHelper 3 3.7838 ns 0.0706 ns 264,343,651.65
OrdinalCompareSubstring 3 3.9750 ns 0.0144 ns 251,576,269.63
EqualsHelper 4 3.9790 ns 0.0515 ns 251,345,534.61
OrdinalCompareSubstring 4 4.2554 ns 0.0297 ns 235,001,013.93
EqualsHelper 5 3.9760 ns 0.0161 ns 251,513,954.20
OrdinalCompareSubstring 5 4.2620 ns 0.0255 ns 234,637,924.74
EqualsHelper 6 4.0342 ns 0.1067 ns 247,998,018.76
OrdinalCompareSubstring 6 4.2443 ns 0.0309 ns 235,618,264.70
EqualsHelper 7 3.9775 ns 0.0223 ns 251,419,057.30
OrdinalCompareSubstring 7 4.2958 ns 0.0363 ns 232,798,663.03
EqualsHelper 8 4.8275 ns 0.0132 ns 207,149,689.60
OrdinalCompareSubstring 8 4.2687 ns 0.0298 ns 234,271,059.72
EqualsHelper 9 4.8124 ns 0.0408 ns 207,805,627.38
OrdinalCompareSubstring 9 4.6896 ns 0.0710 ns 213,269,012.35
EqualsHelper 10 4.9161 ns 0.0634 ns 203,436,995.32
OrdinalCompareSubstring 10 4.6906 ns 0.1279 ns 213,299,091.99
EqualsHelper 11 4.8353 ns 0.0242 ns 206,816,261.57
OrdinalCompareSubstring 11 4.3410 ns 0.1117 ns 230,462,888.03
EqualsHelper 12 6.3871 ns 0.1825 ns 156,650,082.26
OrdinalCompareSubstring 12 4.5727 ns 0.0438 ns 218,700,185.46
EqualsHelper 13 6.2458 ns 0.0328 ns 160,110,008.34
OrdinalCompareSubstring 13 4.5889 ns 0.0646 ns 217,943,991.00
EqualsHelper 14 7.1077 ns 0.0470 ns 140,697,135.33
OrdinalCompareSubstring 14 4.5693 ns 0.0087 ns 218,851,136.38

Two equal length character strings:

Method Len AvrTime StdDev op/s
EqualsHelper 1 4.6369 ns 0.1794 ns 215,870,166.73
OrdinalCompareSubstring 1 4.5604 ns 0.0094 ns 219,278,035.75
EqualsHelper 2 4.5419 ns 0.0237 ns 220,177,697.69
OrdinalCompareSubstring 2 4.3023 ns 0.0429 ns 232,451,507.11
EqualsHelper 3 5.4157 ns 0.0225 ns 184,652,105.07
OrdinalCompareSubstring 3 4.2586 ns 0.0295 ns 234,829,099.21
EqualsHelper 4 5.4452 ns 0.0262 ns 183,649,890.96
OrdinalCompareSubstring 4 5.1252 ns 0.0100 ns 195,116,510.97
EqualsHelper 5 6.2600 ns 0.0173 ns 159,746,206.39
OrdinalCompareSubstring 5 5.1961 ns 0.1305 ns 192,530,192.50
EqualsHelper 6 6.2754 ns 0.0456 ns 159,359,239.59
OrdinalCompareSubstring 6 5.9853 ns 0.0589 ns 167,087,349.82
EqualsHelper 7 7.0796 ns 0.0758 ns 141,261,075.88
OrdinalCompareSubstring 7 5.9765 ns 0.0278 ns 167,323,456.00
EqualsHelper 8 7.1740 ns 0.1392 ns 139,427,727.60
OrdinalCompareSubstring 8 7.0893 ns 0.1048 ns 141,078,981.19
EqualsHelper 9 8.0171 ns 0.0145 ns 124,734,229.93
OrdinalCompareSubstring 9 6.8691 ns 0.0456 ns 145,583,918.13
EqualsHelper 10 8.0684 ns 0.0516 ns 123,943,757.93
OrdinalCompareSubstring 10 7.7114 ns 0.0614 ns 129,683,834.13
EqualsHelper 11 9.1927 ns 0.0787 ns 108,786,813.64
OrdinalCompareSubstring 11 7.8014 ns 0.2387 ns 128,261,575.18
EqualsHelper 12 5.9695 ns 0.0200 ns 167,518,339.52
OrdinalCompareSubstring 12 5.0780 ns 0.9026 ns 200,762,469.14
EqualsHelper 13 5.7687 ns 0.0493 ns 173,357,049.75
OrdinalCompareSubstring 13 4.5448 ns 0.0229 ns 220,037,374.84
EqualsHelper 14 5.6747 ns 0.0329 ns 176,226,182.93
OrdinalCompareSubstring 14 4.6146 ns 0.0729 ns 216,739,726.03

@benaadams
Copy link
Member

length -= can be moved just under the while test rather than last statement in while for better pipelining; so its ready earlier for next test?


[System.Security.SecuritySafeCritical] // auto-generated
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
private unsafe static bool OrdinalCompareSubstring(String strA, String strB, int length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better name for this method would be StartsWithOrdinalHelper.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. I based the name off the CoreRT version of EqualsHelper, OrdinalCompareEqualLengthStrings.

@jkotas
Copy link
Member

jkotas commented Jan 23, 2016

the new function does not replace EqualsHelper as it has enough of a different performance profile

I think it is fine for this to be a different method.

BTW: We will apply a profile guided optimizations (aka IBC) to the shipping mscorlib. The profile guided optimizations will reorder the method basic blocks based on real-world usage, so it is usually not worth it to optimize the code layout for branch taken vs. branch not taken, etc. It is better to optimize for smaller simpler IL - the JIT is more likely do better job on smaller simpler IL when everything else is equal.

@jkotas
Copy link
Member

jkotas commented Jan 23, 2016

LGTM otherwise. Thank you for doing this!

@jkotas
Copy link
Member

jkotas commented Jan 23, 2016

The CI failures are known issues.

if (*(long*)a != *(long*)b) goto ReturnFalse;
if (*(long*)(a + 4) != *(long*)(b + 4)) goto ReturnFalse;
if (*(long*)(a + 8) != *(long*)(b + 8)) goto ReturnFalse;
a += 12; b += 12; length -= 12;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move length -= 12; to first statement of while so is closer to CPU pipeline completed by next iter of while & test

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit torn on that one. It is faster for longer equal strings but it is a tax on early exits (differs within the first 12 characters) which should be more common.

Long = 252 char equal strings
Short = 252 char strings that differ at index 1

Method AvrTime StdDev op/s
LongStartsWithIncAfter 33.7967 ns 1.9410 ns 29,678,807.37
LongStartsWithIncBefore 33.2239 ns 1.8214 ns 30,183,392.10
ShortStartsWithIncAfter 7.0180 ns 0.5621 ns 142,983,128.85
ShortStartsWithIncBefore 7.2873 ns 0.5598 ns 137,668,403.59

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably leave it then; % diff is bigger on early exits than the gain on equals

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get almost the same gain without the penalty just changing
a += 12; b += 12; length -= 12;
to
length -= 2; a += 2; b += 2;
Pretty small numbers but consistent across multiple runs. Can't hurt.

@bbowyersmyth
Copy link
Author

Updated with name change. AMD64 => WIN64. length -= moved to the front in EqualsHelper, StartsWithOrdinalHelper, and CompareOrdinalHelper. ALIGN_ACCESS implemented.

@bbowyersmyth
Copy link
Author

ALIGN_ACCESS conditional removed. Length checks removed in first int alignment read.

@jkotas
Copy link
Member

jkotas commented Jan 26, 2016

👍

jkotas added a commit that referenced this pull request Jan 26, 2016
String.StartsWith performance - OrdinalCompareSubstring
@jkotas jkotas merged commit 8c142fd into dotnet:master Jan 26, 2016
@bbowyersmyth
Copy link
Author

Cheers for everyone's input on this. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants