Skip to content
This repository was archived by the owner on Dec 18, 2018. It is now read-only.

Loop unrolled direct string inject #498

Merged
merged 1 commit into from
Jan 6, 2016

Conversation

benaadams
Copy link
Contributor

+1% RPS - also less complicated; with a single code path rather than three,

Before

Running 30s test @ http://.../plaintext
32 threads and 1024 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 8.25ms 19.86ms 374.26ms 95.29%
Req/Sec 67.64k 5.91k 113.64k 83.57%
62655516 requests in 30.04s, 7.70GB read
Socket errors: connect 35, read 0, write 0, timeout 0
Requests/sec: 2085664.91
Transfer/sec: 262.55MB

After

Running 30s test @ http://.../plaintext
32 threads and 1024 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 6.62ms 15.26ms 293.65ms 96.92%
Req/Sec 66.38k 12.62k 128.97k 94.86%
63472957 requests in 30.05s, 7.80GB read
Socket errors: connect 35, read 0, write 0, timeout 0
Requests/sec: 2112232.38
Transfer/sec: 265.90MB

{
return GetAsciiStringStack(start.Block.Array, start.Index, length);
return default(string);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null

{
output[i + outputOffset] = (char)input[i + inputOffset];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this PR removes more code than it adds, but this part now seems slightly more complicated. Are we sure that manual unrolling is faster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep: #399 (comment)
Non-unrolled (SequentialPointer)
Incrementing value on each unroll step (UnrolledPointer)
Adding to value on each step; and incrementing it at end (UnrolledParallelPointer)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember that now 😄 It's interesting how UnrolledParallelPointer is so consistently better from 1 character all the way to 16384 character strings.

It might be interesting to see how something unaligned like a 33 character string performs, since I don't think any of the test cases hit both loops in GetAsciiStringImplementationB (UnrolledParallelPointer).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point, will get that in new tests when doing looking for new numbers.

The ParallelPointer works because the cpu can pipeline the instructions as they are independent of each other (i.e. the effects of one line don't effect the results of the next); incrementing the pointer at each step means the next step can't run until that pointer addition is completed as its dependent on the result. Is also one of the problems with i++ in a regular loop; the next i++ can't be run until that has completed as it is dependent of its result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also If I didn't unroll I think this would be a perf regression as the string is initialized; rather than being operations on initialized stack memory; then a memcopy to an uninitialized string; however I think the code being less weird makes up for that...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely like the code being less weird. Interesting point about the instruction pipelining. That explains the somewhat unintuitive results. It's amazing the stuff you have to consider at this level of optimization. Someday maybe the compiler will be smart enough to do this kind of thing for us.

I'm going to keep this PR open pending the new numbers.

@benaadams
Copy link
Contributor Author

Updated with unrolling and improved instruction ordering from #519

{
output[i + outputOffset] = (char)input[i + inputOffset];
fixed (byte* blockStart = block.Array)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a further combining change that will this fixed block (as buffers are already fixed) - but this stands alone as is.

@benaadams benaadams mentioned this pull request Dec 29, 2015
@halter73
Copy link
Member

Thanks for all research/testing you put into this. LGTM. :shipit:

@benaadams
Copy link
Contributor Author

Rebased

@halter73
Copy link
Member

Actually, before I merge this, do you have the test script and results that show this is the fasted way to create a string from ASCII bytes?

#399 (comment) doesn't cover the current implementation.

@benaadams
Copy link
Contributor Author

Sure :D

@benaadams
Copy link
Contributor Author

Still need to do the benchmarking with various sizes; though taking a step as the bracketed chunk where i, input and output are modified the step count growth vs length looks like the following for step sizes 12,6,4,1 - at 12.

step-growth-vs-length

119 being 12 steps (120 dropping back to 10 steps as is multiple of 12)

count steps
1 1
2 2
3 3
4 1
5 2
6 1
7 2
8 3
9 4
10 2
11 3
12 1
13 2
14 3
15 4
16 2
17 3
18 2
19 2
20 3
21 4
22 3
23 4
24 2
25 3
26 4
27 5
28 3
29 4
30 3
31 3
32 4
33 5
34 4
35 5
36 3

Working for a 14 stage pipelined cpu so i is available for the next if statement 14 instructions later; though it looks like the most recent chips have moved from 14 stages to 20-24.

Is different for the non-byte conversion copy as int and longs (and vectors) can be used.

@benaadams
Copy link
Contributor Author

Running benchmarks now

@benaadams
Copy link
Contributor Author

Benchmark using System.Text.Encoding.ASCII.GetString, current one stack, and unrolled version (this PR).

Actually creating the strings to test properly.

BenchmarkDotNet-Dev=v0.8.0.0+
OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz, ProcessorCount=8
HostCLR=MS.NET 4.0.30319.42000, Arch=64-bit  [RyuJIT]
Type=AsciiCreate  Mode=Throughput  Platform=X64  Jit=RyuJit  .NET=HostFramework  toolchain=Classic  Runtime=Clr  Warmup=5  Target=10
Method Len AvrTime StdDev op/s improve
GetAsciiStringEncoding 0 6.3856 ns 0.1169 ns 156,653,606.68
GetAsciiStringStack 0 5.2684 ns 0.0127 ns 189,810,626.86 21%
GetAsciiStringUnrolled 0 4.4208 ns 0.0140 ns 226,204,241.47 44%
GetAsciiStringEncoding 1 26.5309 ns 0.0356 ns 37,691,946.79
GetAsciiStringStack 1 14.1725 ns 0.2380 ns 70,578,500.95 87%
GetAsciiStringUnrolled 1 7.7349 ns 0.1323 ns 129,320,578.74 243%
GetAsciiStringEncoding 2 27.6725 ns 0.0320 ns 36,137,019.03
GetAsciiStringStack 2 15.0977 ns 0.0635 ns 66,236,222.20 83%
GetAsciiStringUnrolled 2 8.4002 ns 0.1619 ns 119,088,715.40 230%
GetAsciiStringEncoding 3 28.5409 ns 0.4908 ns 35,047,572.85
GetAsciiStringStack 3 15.9660 ns 0.1267 ns 62,636,685.04 79%
GetAsciiStringUnrolled 3 8.8578 ns 0.0329 ns 112,895,812.72 222%
GetAsciiStringEncoding 4 30.9573 ns 0.1652 ns 32,303,440.88
GetAsciiStringStack 4 16.9224 ns 0.0369 ns 59,093,455.47 83%
GetAsciiStringUnrolled 4 8.6447 ns 0.0486 ns 115,681,631.66 258%
GetAsciiStringEncoding 5 32.2680 ns 0.3021 ns 30,993,097.10
GetAsciiStringStack 5 18.0094 ns 0.1083 ns 55,528,407.74 79%
GetAsciiStringUnrolled 5 9.1457 ns 0.0315 ns 109,342,438.67 253%
GetAsciiStringEncoding 6 33.5031 ns 0.7046 ns 29,860,766.68
GetAsciiStringStack 6 18.6508 ns 0.0533 ns 53,617,329.77 80%
GetAsciiStringUnrolled 6 9.1732 ns 0.0298 ns 109,014,335.28 265%
GetAsciiStringEncoding 7 34.7881 ns 0.4244 ns 28,749,689.65
GetAsciiStringStack 7 19.6855 ns 0.0596 ns 50,799,349.04 77%
GetAsciiStringUnrolled 7 9.3058 ns 0.2003 ns 107,507,156.82 274%
GetAsciiStringEncoding 8 36.1978 ns 0.3716 ns 27,628,883.07
GetAsciiStringStack 8 20.8175 ns 0.0728 ns 48,037,145.01 74%
GetAsciiStringUnrolled 8 10.4343 ns 0.0367 ns 95,838,567.13 247%
GetAsciiStringEncoding 9 37.5087 ns 0.4505 ns 26,664,301.05
GetAsciiStringStack 9 23.7708 ns 0.0672 ns 42,068,676.64 58%
GetAsciiStringUnrolled 9 10.9736 ns 0.1783 ns 91,151,113.04 242%
GetAsciiStringEncoding 10 38.3184 ns 0.6713 ns 26,104,951.22
GetAsciiStringStack 10 24.6946 ns 0.1382 ns 40,495,935.51 55%
GetAsciiStringUnrolled 10 10.9550 ns 0.2587 ns 91,330,664.10 250%
GetAsciiStringEncoding 11 39.4201 ns 0.7459 ns 25,376,511.38
GetAsciiStringStack 11 25.8510 ns 0.0512 ns 38,683,433.65 52%
GetAsciiStringUnrolled 11 11.1191 ns 0.0402 ns 89,936,842.58 254%
GetAsciiStringEncoding 12 41.3879 ns 0.4545 ns 24,164,485.71
GetAsciiStringStack 12 26.2828 ns 0.4456 ns 38,058,145.12 57%
GetAsciiStringUnrolled 12 11.9781 ns 0.1244 ns 83,494,318.91 246%
GetAsciiStringEncoding 13 41.4398 ns 0.9442 ns 24,143,681.18
GetAsciiStringStack 13 27.9275 ns 0.0937 ns 35,807,368.88 48%
GetAsciiStringUnrolled 13 12.5717 ns 0.0533 ns 79,545,230.02 229%
GetAsciiStringEncoding 14 42.5840 ns 0.7131 ns 23,489,419.87
GetAsciiStringStack 14 28.7776 ns 0.0347 ns 34,749,284.35 48%
GetAsciiStringUnrolled 14 12.6846 ns 0.0922 ns 78,839,682.77 236%
GetAsciiStringEncoding 15 43.9742 ns 0.8635 ns 22,749,156.48
GetAsciiStringStack 15 29.2612 ns 0.5262 ns 34,185,525.67 50%
GetAsciiStringUnrolled 15 13.6817 ns 0.0275 ns 73,090,731.51 221%
GetAsciiStringEncoding 16 45.9021 ns 0.2249 ns 21,786,010.31
GetAsciiStringStack 16 29.9649 ns 0.1471 ns 33,373,174.51 53%
GetAsciiStringUnrolled 16 13.3515 ns 0.0315 ns 74,898,410.71 244%
GetAsciiStringEncoding 17 46.4066 ns 0.8744 ns 21,556,146.45
GetAsciiStringStack 17 32.5104 ns 0.1292 ns 30,759,880.93 43%
GetAsciiStringUnrolled 17 14.2499 ns 0.0330 ns 70,176,184.27 226%
GetAsciiStringEncoding 18 47.7287 ns 0.9818 ns 20,960,404.20
GetAsciiStringStack 18 33.3996 ns 0.1483 ns 29,941,062.05 43%
GetAsciiStringUnrolled 18 14.1331 ns 0.3225 ns 70,790,910.88 238%
GetAsciiStringEncoding 19 50.1159 ns 0.4713 ns 19,955,429.95
GetAsciiStringStack 19 34.1617 ns 0.6572 ns 29,283,121.90 47%
GetAsciiStringUnrolled 19 15.0138 ns 0.0493 ns 66,606,101.65 234%
GetAsciiStringEncoding 20 51.2267 ns 0.7124 ns 19,524,742.60
GetAsciiStringStack 20 34.1321 ns 0.1323 ns 29,298,333.32 50%
GetAsciiStringUnrolled 20 15.9209 ns 0.0690 ns 62,811,618.89 222%
GetAsciiStringEncoding 21 53.6812 ns 0.3992 ns 18,629,494.19
GetAsciiStringStack 21 36.2021 ns 0.6744 ns 27,632,007.81 48%
GetAsciiStringUnrolled 21 16.1166 ns 0.0845 ns 62,049,295.47 233%
GetAsciiStringEncoding 22 55.7167 ns 0.5608 ns 17,949,692.37
GetAsciiStringStack 22 36.9721 ns 0.7434 ns 27,058,138.63 51%
GetAsciiStringUnrolled 22 16.5684 ns 0.0796 ns 60,357,337.53 236%
GetAsciiStringEncoding 23 57.2454 ns 0.2286 ns 17,468,913.98
GetAsciiStringStack 23 38.5952 ns 0.1944 ns 25,910,598.31 48%
GetAsciiStringUnrolled 23 16.5282 ns 0.2687 ns 60,517,895.58 246%
GetAsciiStringEncoding 24 59.6127 ns 0.4782 ns 16,775,996.18
GetAsciiStringStack 24 39.7004 ns 0.0633 ns 25,188,702.34 50%
GetAsciiStringUnrolled 24 17.2569 ns 0.0566 ns 57,948,493.11 245%
GetAsciiStringEncoding 25 61.6060 ns 0.4221 ns 16,232,917.54
GetAsciiStringStack 25 40.2711 ns 1.0238 ns 24,846,854.39 53%
GetAsciiStringUnrolled 25 17.4347 ns 0.3267 ns 57,376,492.61 253%
GetAsciiStringEncoding 26 62.3887 ns 1.2943 ns 16,035,269.35
GetAsciiStringStack 26 40.7875 ns 0.3357 ns 24,518,893.70 53%
GetAsciiStringUnrolled 26 17.1934 ns 0.0860 ns 58,163,222.65 263%
GetAsciiStringEncoding 27 64.8049 ns 0.2827 ns 15,431,208.76
GetAsciiStringStack 27 42.9668 ns 0.6764 ns 23,279,426.67 51%
GetAsciiStringUnrolled 27 18.0682 ns 0.2526 ns 55,356,551.04 259%
GetAsciiStringEncoding 28 66.1509 ns 1.2633 ns 15,122,360.94
GetAsciiStringStack 28 44.3772 ns 0.1273 ns 22,534,274.55 49%
GetAsciiStringUnrolled 28 18.8945 ns 0.0650 ns 52,926,029.44 250%
GetAsciiStringEncoding 29 67.6734 ns 1.1017 ns 14,780,678.02
GetAsciiStringStack 29 45.6187 ns 0.1284 ns 21,920,981.49 48%
GetAsciiStringUnrolled 29 19.2249 ns 0.2184 ns 52,022,283.77 252%
GetAsciiStringEncoding 30 69.7017 ns 1.4238 ns 14,352,693.17
GetAsciiStringStack 30 46.4138 ns 0.0906 ns 21,545,385.42 50%
GetAsciiStringUnrolled 30 19.4372 ns 0.0333 ns 51,447,912.17 258%
GetAsciiStringEncoding 31 72.1075 ns 0.2739 ns 13,868,378.19
GetAsciiStringStack 31 47.6223 ns 0.1275 ns 20,998,716.21 51%
GetAsciiStringUnrolled 31 19.7378 ns 0.0677 ns 50,664,745.01 265%
GetAsciiStringEncoding 32 73.2524 ns 1.2367 ns 13,655,225.80
GetAsciiStringStack 32 48.6683 ns 0.1808 ns 20,547,546.18 50%
GetAsciiStringUnrolled 32 20.0676 ns 0.3995 ns 49,850,387.65 265%
GetAsciiStringEncoding 33 75.1232 ns 1.5505 ns 13,316,941.26
GetAsciiStringStack 33 48.9495 ns 0.8913 ns 20,435,699.37 53%
GetAsciiStringUnrolled 33 19.9889 ns 0.3932 ns 50,044,933.43 276%
GetAsciiStringEncoding 34 77.7999 ns 0.2349 ns 12,853,606.68
GetAsciiStringStack 34 50.8965 ns 0.1369 ns 19,647,835.70 53%
GetAsciiStringUnrolled 34 20.9920 ns 0.5173 ns 47,665,557.86 271%
GetAsciiStringEncoding 35 78.5528 ns 1.3023 ns 12,733,714.97
GetAsciiStringStack 35 52.0398 ns 0.0678 ns 19,216,097.64 51%
GetAsciiStringUnrolled 35 21.5082 ns 0.0653 ns 46,494,278.30 265%
GetAsciiStringEncoding 36 81.6082 ns 0.3671 ns 12,253,899.17
GetAsciiStringStack 36 53.0631 ns 0.2213 ns 18,845,817.06 54%
GetAsciiStringUnrolled 36 21.8861 ns 0.0575 ns 45,691,340.13 273%
GetAsciiStringEncoding 37 83.3324 ns 0.1186 ns 12,000,157.43
GetAsciiStringStack 37 53.5346 ns 0.8940 ns 18,684,623.54 56%
GetAsciiStringUnrolled 37 22.2454 ns 0.0648 ns 44,953,534.62 275%
GetAsciiStringEncoding 38 84.4947 ns 1.8243 ns 11,840,440.96
GetAsciiStringStack 38 54.9712 ns 0.1666 ns 18,191,500.92 54%
GetAsciiStringUnrolled 38 22.4304 ns 0.0635 ns 44,582,722.44 277%
GetAsciiStringEncoding 39 85.6394 ns 1.4394 ns 11,680,099.44
GetAsciiStringStack 39 56.3114 ns 0.2481 ns 17,758,714.44 52%
GetAsciiStringUnrolled 39 22.4097 ns 0.3473 ns 44,633,799.76 282%
GetAsciiStringEncoding 40 87.3818 ns 2.1064 ns 11,450,364.87
GetAsciiStringStack 40 59.0741 ns 2.1585 ns 16,949,255.23 48%
GetAsciiStringUnrolled 40 22.6736 ns 0.0466 ns 44,104,264.61 285%
GetAsciiStringEncoding 41 90.7927 ns 0.4290 ns 11,014,331.78
GetAsciiStringStack 41 60.4196 ns 2.0145 ns 16,568,306.71 50%
GetAsciiStringUnrolled 41 23.3923 ns 0.3983 ns 42,761,169.33 288%
GetAsciiStringEncoding 42 90.2018 ns 1.6380 ns 11,089,743.59
GetAsciiStringStack 42 59.0223 ns 1.1791 ns 16,949,383.85 53%
GetAsciiStringUnrolled 42 24.0117 ns 0.0715 ns 41,646,755.61 276%
GetAsciiStringEncoding 43 94.2435 ns 0.0716 ns 10,610,815.97
GetAsciiStringStack 43 64.5399 ns 1.0718 ns 15,498,455.18 46%
GetAsciiStringUnrolled 43 24.0080 ns 0.4103 ns 41,664,737.61 293%
GetAsciiStringEncoding 44 94.1801 ns 1.5597 ns 10,620,745.12
GetAsciiStringStack 44 64.1763 ns 1.1878 ns 15,587,172.37 47%
GetAsciiStringUnrolled 44 24.1789 ns 0.0571 ns 41,358,593.30 289%
GetAsciiStringEncoding 45 98.0178 ns 0.0738 ns 10,202,236.96
GetAsciiStringStack 45 66.2021 ns 0.2113 ns 15,105,405.65 48%
GetAsciiStringUnrolled 45 25.1663 ns 0.1113 ns 39,736,450.43 289%
GetAsciiStringEncoding 46 99.0558 ns 1.0954 ns 10,096,513.56
GetAsciiStringStack 46 65.7820 ns 1.1988 ns 15,206,552.63 51%
GetAsciiStringUnrolled 46 24.8659 ns 0.1466 ns 40,216,996.50 298%
GetAsciiStringEncoding 47 100.5068 ns 1.7285 ns 9,952,447.51
GetAsciiStringStack 47 67.0547 ns 1.0724 ns 14,916,921.07 50%
GetAsciiStringUnrolled 47 26.0434 ns 0.0496 ns 38,397,582.45 286%
GetAsciiStringEncoding 48 103.1534 ns 2.1085 ns 9,698,258.21
GetAsciiStringStack 48 68.5015 ns 1.2879 ns 14,603,261.11 51%
GetAsciiStringUnrolled 48 25.6282 ns 0.0812 ns 39,019,914.17 302%
GetAsciiStringEncoding 49 105.4773 ns 0.2176 ns 9,480,753.86
GetAsciiStringStack 49 69.8814 ns 0.1102 ns 14,309,994.81 51%
GetAsciiStringUnrolled 49 26.2259 ns 0.4283 ns 38,139,942.99 302%
GetAsciiStringEncoding 50 105.8742 ns 2.3254 ns 9,449,573.99
GetAsciiStringStack 50 72.1130 ns 1.4607 ns 13,872,718.84 47%
GetAsciiStringUnrolled 50 27.0894 ns 0.0830 ns 36,915,135.99 291%
GetAsciiStringEncoding 51 108.0370 ns 1.8221 ns 9,258,663.41
GetAsciiStringStack 51 71.8510 ns 1.2263 ns 13,921,578.35 50%
GetAsciiStringUnrolled 51 27.5972 ns 0.0934 ns 36,236,016.91 291%
GetAsciiStringEncoding 52 111.2204 ns 0.5066 ns 8,991,338.42
GetAsciiStringStack 52 74.0626 ns 0.2539 ns 13,502,248.36 50%
GetAsciiStringUnrolled 52 28.0177 ns 0.0908 ns 35,692,135.88 297%
GetAsciiStringEncoding 53 112.8264 ns 0.4104 ns 8,863,283.66
GetAsciiStringStack 53 73.6823 ns 1.2595 ns 13,575,649.67 53%
GetAsciiStringUnrolled 53 27.3605 ns 0.1276 ns 36,549,857.05 312%
GetAsciiStringEncoding 54 113.3157 ns 2.1419 ns 8,827,987.21
GetAsciiStringStack 54 75.0048 ns 0.1700 ns 13,332,547.88 51%
GetAsciiStringUnrolled 54 27.7896 ns 0.1065 ns 35,985,175.13 308%
GetAsciiStringEncoding 55 116.2862 ns 0.4476 ns 8,599,596.39
GetAsciiStringStack 55 73.8734 ns 1.4245 ns 13,541,596.32 57%
GetAsciiStringUnrolled 55 28.6603 ns 0.5177 ns 34,902,674.33 306%
GetAsciiStringEncoding 56 119.3017 ns 0.6499 ns 8,382,354.51
GetAsciiStringStack 56 76.9783 ns 0.1384 ns 12,990,710.99 55%
GetAsciiStringUnrolled 56 29.6072 ns 0.0882 ns 33,775,831.87 303%
GetAsciiStringEncoding 57 120.4755 ns 0.4232 ns 8,300,538.22
GetAsciiStringStack 57 76.0326 ns 1.1544 ns 13,155,230.15 58%
GetAsciiStringUnrolled 57 29.7165 ns 0.1258 ns 33,651,973.98 305%
GetAsciiStringEncoding 58 120.5535 ns 2.0244 ns 8,297,355.82
GetAsciiStringStack 58 80.0423 ns 0.4938 ns 12,493,844.30 51%
GetAsciiStringUnrolled 58 29.6413 ns 0.5289 ns 33,746,998.70 307%
GetAsciiStringEncoding 59 122.3798 ns 2.1938 ns 8,173,850.58
GetAsciiStringStack 59 78.7113 ns 1.2848 ns 12,707,904.94 55%
GetAsciiStringUnrolled 59 29.8788 ns 0.3652 ns 33,473,395.01 310%
GetAsciiStringEncoding 60 124.1719 ns 2.2164 ns 8,055,862.63
GetAsciiStringStack 60 81.2583 ns 0.1796 ns 12,306,498.43 53%
GetAsciiStringUnrolled 60 30.9875 ns 0.0536 ns 32,271,166.04 301%
GetAsciiStringEncoding 61 127.6635 ns 0.6244 ns 7,833,272.08
GetAsciiStringStack 61 83.5114 ns 0.2375 ns 11,974,507.46 53%
GetAsciiStringUnrolled 61 31.3584 ns 0.0419 ns 31,889,447.49 307%
GetAsciiStringEncoding 62 127.9033 ns 2.2352 ns 7,820,741.67
GetAsciiStringStack 62 80.2288 ns 1.4863 ns 12,468,431.42 59%
GetAsciiStringUnrolled 62 31.5566 ns 0.0629 ns 31,689,252.11 305%
GetAsciiStringEncoding 63 131.2502 ns 0.7832 ns 7,619,293.34
GetAsciiStringStack 63 82.7944 ns 0.1248 ns 12,078,135.20 59%
GetAsciiStringUnrolled 63 31.7394 ns 0.4595 ns 31,513,044.38 314%
GetAsciiStringEncoding 64 131.6511 ns 2.2129 ns 7,597,935.94
GetAsciiStringStack 64 83.1658 ns 1.4310 ns 12,027,648.41 58%
GetAsciiStringUnrolled 64 32.6071 ns 0.1295 ns 30,668,607.09 304%
GetAsciiStringEncoding 65 135.3443 ns 0.6890 ns 7,388,747.27
GetAsciiStringStack 65 84.9220 ns 0.1228 ns 11,775,541.60 59%
GetAsciiStringUnrolled 65 31.7158 ns 0.1263 ns 31,530,531.78 327%
GetAsciiStringEncoding 66 135.0913 ns 2.3630 ns 7,404,615.66
GetAsciiStringStack 66 86.0645 ns 1.5442 ns 11,622,842.55 57%
GetAsciiStringUnrolled 66 33.2187 ns 0.0711 ns 30,103,693.98 307%
GetAsciiStringEncoding 67 138.7461 ns 0.7294 ns 7,207,595.76
GetAsciiStringStack 67 87.7398 ns 0.1579 ns 11,397,373.75 58%
GetAsciiStringUnrolled 67 33.4898 ns 0.1348 ns 29,860,305.66 314%
GetAsciiStringEncoding 68 140.8501 ns 0.4850 ns 7,099,828.93
GetAsciiStringStack 68 87.3443 ns 1.6001 ns 11,452,703.01 61%
GetAsciiStringUnrolled 68 33.0939 ns 0.1394 ns 30,217,519.35 326%
GetAsciiStringEncoding 69 142.5339 ns 0.4401 ns 7,015,938.07
GetAsciiStringStack 69 88.8778 ns 0.2925 ns 11,251,514.35 60%
GetAsciiStringUnrolled 69 34.4021 ns 0.0976 ns 29,068,243.59 314%
GetAsciiStringEncoding 70 140.9871 ns 2.4454 ns 7,094,888.18
GetAsciiStringStack 70 89.2703 ns 0.1105 ns 11,201,951.18 58%
GetAsciiStringUnrolled 70 34.9844 ns 0.1052 ns 28,584,449.34 303%
GetAsciiStringEncoding 71 146.0357 ns 0.2601 ns 6,847,659.91
GetAsciiStringStack 71 89.8671 ns 0.2524 ns 11,127,632.92 63%
GetAsciiStringUnrolled 71 35.3492 ns 0.1708 ns 28,289,851.51 313%
GetAsciiStringEncoding 72 148.1829 ns 0.5846 ns 6,748,517.06
GetAsciiStringStack 72 88.3982 ns 1.9460 ns 11,317,275.51 68%
GetAsciiStringUnrolled 72 35.7736 ns 0.3331 ns 27,955,864.23 314%
GetAsciiStringEncoding 73 148.0275 ns 2.7551 ns 6,757,795.67
GetAsciiStringStack 73 92.4239 ns 0.3745 ns 10,819,884.44 60%
GetAsciiStringUnrolled 73 36.0294 ns 0.1245 ns 27,755,415.65 311%
GetAsciiStringEncoding 74 151.8114 ns 0.6224 ns 6,587,226.78
GetAsciiStringStack 74 94.2859 ns 0.2559 ns 10,606,116.19 61%
GetAsciiStringUnrolled 74 36.1004 ns 0.0660 ns 27,700,617.45 321%
GetAsciiStringEncoding 75 154.2497 ns 1.5087 ns 6,483,590.82
GetAsciiStringStack 75 95.4720 ns 0.7033 ns 10,474,807.55 62%
GetAsciiStringUnrolled 75 36.2059 ns 0.6294 ns 27,628,000.50 326%
GetAsciiStringEncoding 76 156.0395 ns 0.6231 ns 6,408,733.44
GetAsciiStringStack 76 95.5809 ns 0.1158 ns 10,462,351.87 63%
GetAsciiStringUnrolled 76 37.2298 ns 0.1399 ns 26,860,595.07 319%
GetAsciiStringEncoding 77 157.7147 ns 1.1237 ns 6,340,870.47
GetAsciiStringStack 77 94.9998 ns 1.6717 ns 10,529,523.89 66%
GetAsciiStringUnrolled 77 37.4612 ns 0.1892 ns 26,694,958.74 321%
GetAsciiStringEncoding 78 159.1194 ns 0.5110 ns 6,284,649.51
GetAsciiStringStack 78 96.7728 ns 1.0274 ns 10,334,566.30 64%
GetAsciiStringUnrolled 78 37.6939 ns 0.1153 ns 26,529,707.40 322%
GetAsciiStringEncoding 79 160.8513 ns 0.2415 ns 6,216,934.22
GetAsciiStringStack 79 97.1092 ns 0.6307 ns 10,298,105.84 66%
GetAsciiStringUnrolled 79 38.0406 ns 0.0952 ns 26,287,872.84 323%
GetAsciiStringEncoding 80 162.7922 ns 0.1236 ns 6,142,804.59
GetAsciiStringStack 80 97.2582 ns 1.6836 ns 10,284,926.70 67%
GetAsciiStringUnrolled 80 38.7057 ns 0.1788 ns 25,836,519.59 321%
GetAsciiStringEncoding 81 164.8054 ns 0.6023 ns 6,067,839.94
GetAsciiStringStack 81 98.9893 ns 0.3707 ns 10,102,234.68 66%
GetAsciiStringUnrolled 81 38.5821 ns 0.4893 ns 25,922,840.20 327%
GetAsciiStringEncoding 82 164.4606 ns 3.0080 ns 6,082,474.45
GetAsciiStringStack 82 101.8905 ns 0.3400 ns 9,814,560.89 61%
GetAsciiStringUnrolled 82 39.5835 ns 0.1506 ns 25,263,381.37 315%
GetAsciiStringEncoding 83 169.1799 ns 1.9086 ns 5,911,560.52
GetAsciiStringStack 83 102.3499 ns 0.1015 ns 9,770,413.77 65%
GetAsciiStringUnrolled 83 39.7400 ns 0.1270 ns 25,163,801.81 326%
GetAsciiStringEncoding 84 171.2053 ns 1.0942 ns 5,841,170.60
GetAsciiStringStack 84 102.7868 ns 0.0867 ns 9,728,881.62 67%
GetAsciiStringUnrolled 84 38.7474 ns 0.1879 ns 25,808,736.27 342%
GetAsciiStringEncoding 85 171.8738 ns 0.1169 ns 5,818,223.42
GetAsciiStringStack 85 103.3070 ns 0.3461 ns 9,679,989.19 66%
GetAsciiStringUnrolled 85 40.1327 ns 0.7376 ns 24,925,562.60 328%
GetAsciiStringEncoding 86 173.7038 ns 0.2520 ns 5,756,938.21
GetAsciiStringStack 86 103.4217 ns 1.7671 ns 9,671,915.62 68%
GetAsciiStringUnrolled 86 39.3751 ns 0.1796 ns 25,397,296.51 341%
GetAsciiStringEncoding 87 176.3119 ns 1.2414 ns 5,672,038.63
GetAsciiStringStack 87 104.5131 ns 1.0896 ns 9,569,184.02 69%
GetAsciiStringUnrolled 87 41.4257 ns 0.4225 ns 24,141,951.05 326%
GetAsciiStringEncoding 88 175.5169 ns 3.0111 ns 5,699,097.18
GetAsciiStringStack 88 105.3018 ns 1.9435 ns 9,499,680.74 67%
GetAsciiStringUnrolled 88 41.2813 ns 1.1543 ns 24,241,871.01 325%
GetAsciiStringEncoding 89 179.6256 ns 0.8114 ns 5,567,242.43
GetAsciiStringStack 89 107.8481 ns 0.4060 ns 9,272,427.18 67%
GetAsciiStringUnrolled 89 41.6976 ns 0.6848 ns 23,988,523.91 331%
GetAsciiStringEncoding 90 181.7802 ns 0.7951 ns 5,501,249.25
GetAsciiStringStack 90 107.6579 ns 2.1216 ns 9,292,196.34 69%
GetAsciiStringUnrolled 90 42.3043 ns 0.1081 ns 23,638,425.66 330%
GetAsciiStringEncoding 91 183.0337 ns 0.7597 ns 5,463,563.38
GetAsciiStringStack 91 108.3365 ns 1.9540 ns 9,233,437.60 69%
GetAsciiStringUnrolled 91 42.6017 ns 0.1799 ns 23,473,649.09 330%
GetAsciiStringEncoding 92 185.0401 ns 0.9934 ns 5,404,381.54
GetAsciiStringStack 92 108.8356 ns 1.8898 ns 9,190,881.37 70%
GetAsciiStringUnrolled 92 43.3148 ns 0.1385 ns 23,087,012.82 327%
GetAsciiStringEncoding 93 186.6905 ns 0.1691 ns 5,356,463.16
GetAsciiStringStack 93 110.5899 ns 0.2924 ns 9,042,481.81 69%
GetAsciiStringUnrolled 93 42.5325 ns 0.6945 ns 23,517,405.59 339%
GetAsciiStringEncoding 94 189.1519 ns 1.5878 ns 5,287,106.88
GetAsciiStringStack 94 112.0398 ns 0.4531 ns 8,925,538.96 69%
GetAsciiStringUnrolled 94 43.5655 ns 0.6660 ns 22,959,189.35 334%
GetAsciiStringEncoding 95 191.0141 ns 1.0566 ns 5,235,370.32
GetAsciiStringStack 95 110.8591 ns 1.6453 ns 9,022,393.76 72%
GetAsciiStringUnrolled 95 44.4152 ns 0.1813 ns 22,515,199.50 330%
GetAsciiStringEncoding 96 192.4730 ns 2.6686 ns 5,196,515.14
GetAsciiStringStack 96 111.7660 ns 1.9298 ns 8,949,874.51 72%
GetAsciiStringUnrolled 96 44.8737 ns 0.2062 ns 22,285,221.73 329%
GetAsciiStringEncoding 508 944.8759 ns 17.0763 ns 1,058,678.46
GetAsciiStringStack 508 579.5744 ns 1.3352 ns 1,725,412.84 63%
GetAsciiStringUnrolled 508 206.7729 ns 0.5415 ns 4,836,256.08 357%
GetAsciiStringEncoding 509 946.5941 ns 17.1607 ns 1,056,758.62
GetAsciiStringStack 509 579.5389 ns 2.1671 ns 1,725,532.93 63%
GetAsciiStringUnrolled 509 204.2443 ns 4.5613 ns 4,898,449.60 364%
GetAsciiStringEncoding 510 938.5520 ns 19.4110 ns 1,065,905.56
GetAsciiStringStack 510 580.9886 ns 1.4832 ns 1,721,214.91 61%
GetAsciiStringUnrolled 510 203.4664 ns 3.5925 ns 4,916,315.08 361%
GetAsciiStringEncoding 511 949.2423 ns 16.6339 ns 1,053,788.50
GetAsciiStringStack 511 575.3920 ns 8.9746 ns 1,738,358.77 65%
GetAsciiStringUnrolled 511 207.0813 ns 0.7946 ns 4,829,088.72 358%
GetAsciiStringEncoding 512 950.7381 ns 17.7106 ns 1,052,171.32
GetAsciiStringStack 512 574.6530 ns 9.7788 ns 1,740,673.21 65%
GetAsciiStringUnrolled 512 208.0087 ns 0.4574 ns 4,807,514.19 357%
GetAsciiStringEncoding 513 953.8168 ns 17.6302 ns 1,048,769.47
GetAsciiStringStack 513 586.0188 ns 2.2532 ns 1,706,454.22 63%
GetAsciiStringUnrolled 513 207.4819 ns 0.8733 ns 4,819,779.47 360%
GetAsciiStringEncoding 543 1,020.3621 ns 1.2628 ns 980,045.67
GetAsciiStringStack 543 610.0105 ns 9.1914 ns 1,639,679.98 67%
GetAsciiStringUnrolled 543 221.1857 ns 0.8742 ns 4,521,155.30 361%
GetAsciiStringEncoding 544 1,024.6794 ns 4.7021 ns 975,934.69
GetAsciiStringStack 544 614.9500 ns 2.1888 ns 1,626,168.30 67%
GetAsciiStringUnrolled 544 222.1349 ns 0.5794 ns 4,501,798.23 361%
GetAsciiStringEncoding 547 1,018.0020 ns 17.9729 ns 982,616.33
GetAsciiStringStack 547 615.6357 ns 9.3914 ns 1,624,707.14 65%
GetAsciiStringUnrolled 547 221.0544 ns 1.1208 ns 4,523,885.37 360%
GetAsciiStringEncoding 576 1,079.6416 ns 0.5652 ns 926,233.57
GetAsciiStringStack 576 650.5971 ns 3.2925 ns 1,537,087.13 66%
GetAsciiStringUnrolled 576 235.2686 ns 1.7363 ns 4,250,684.86 359%
GetAsciiStringEncoding 1024 1,915.6749 ns 6.6849 ns 522,015.39
GetAsciiStringStack 1024 940.9080 ns 18.4671 ns 1,063,194.87 104%
GetAsciiStringUnrolled 1024 408.9411 ns 1.9981 ns 2,445,395.80 368%
GetAsciiStringEncoding 4096 7,642.0782 ns 9.6052 ns 130,854.66
GetAsciiStringStack 4096 3,754.2921 ns 13.0359 ns 266,364.90 104%
GetAsciiStringUnrolled 4096 1,635.3924 ns 11.3656 ns 611,502.51 367%
GetAsciiStringEncoding 8512 15,603.8434 ns 275.6553 ns 64,106.35
GetAsciiStringStack 8512 7,793.4370 ns 12.7716 ns 128,313.42 100%
GetAsciiStringUnrolled 8512 3,274.6417 ns 7.8795 ns 305,378.63 376%

Though Encoding.ASCII.GetString can't be used anyway as a pre-completed array is needed due to string immutableness (ignoring what this PR is doing); so that situation is much worse.

@halter73 halter73 merged commit ea3e64a into aspnet:dev Jan 6, 2016
@halter73
Copy link
Member

halter73 commented Jan 6, 2016

Thanks for redoing the tests.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants