Skip to content

Perf: String.StartsWith Linux regression from 1x speed to 100x speed #15372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ianhays opened this issue Oct 6, 2015 · 2 comments · Fixed by dotnet/coreclr#2630
Closed

Perf: String.StartsWith Linux regression from 1x speed to 100x speed #15372

ianhays opened this issue Oct 6, 2015 · 2 comments · Fixed by dotnet/coreclr#2630
Assignees
Labels
area-System.Runtime os-linux Linux OS (any supported distro) tenet-performance Performance related issue
Milestone

Comments

@ianhays
Copy link
Contributor

ianhays commented Oct 6, 2015

The perf test and all applicable helper methods:

[Benchmark]
[MemberData("TestStringSizes")]
public void StartsWith(int size)
{
    PerfUtils utils = new PerfUtils();
    string testString = utils.CreateString(size);
    string subString = testString.Substring(0, testString.Length / 4);
    foreach (var iteration in Benchmark.Iterations)
        using (iteration.StartMeasurement())
            for (int i = 0; i < 10000; i++)
                testString.StartsWith(subString);
}

public static IEnumerable<object[]> TestStringSizes()
{
    yield return new object[] { 10 };
    yield return new object[] { 100 };
    yield return new object[] { 1000 };
}

/// <summary>
/// Helper method to create a string containing a number of random
/// characters equal to the specified length
/// </summary>
public string CreateString(int length)
{
    byte[] bytes = new byte[length];
    rand.NextBytes(bytes);
    return System.Convert.ToBase64String(bytes);
}

Linux results after regression (today, 10/6/2015)

  <test name="System.Runtime.Tests.Perf_String.StartsWith(size: 10)" type="System.Runtime.Tests.Perf_String" method="StartsWith" time="1.1608339" result="Pass">
    <traits>
      <trait name="Benchmark" value="true" />
    </traits>
    <performance runid="System.Runtime.Tests.dll-Linux" etl="/home/ianha/alliswell/results/System.Runtime.Tests.dll-Linux.csv">
      <metrics>
        <Duration displayName="Duration" unit="msec" />
      </metrics>
      <iterations>
        <iteration index="0" Duration="95.756800055503845" />
        <iteration index="1" Duration="94.822700023651123" />
        <iteration index="2" Duration="95.039200067520142" />
        <iteration index="3" Duration="132.58509993553162" />
        <iteration index="4" Duration="94.424399971961975" />
        <iteration index="5" Duration="98.67739999294281" />
        <iteration index="6" Duration="95.342700004577637" />
        <iteration index="7" Duration="130.88179993629456" />
        <iteration index="8" Duration="93.880900025367737" />
        <iteration index="9" Duration="97.234200000762939" />
        <iteration index="10" Duration="94.284600019454956" />
      </iterations>
    </performance>
  </test>
  <test name="System.Runtime.Tests.Perf_String.StartsWith(size: 100)" type="System.Runtime.Tests.Perf_String" method="StartsWith" time="1.1627864" result="Pass">
    <traits>
      <trait name="Benchmark" value="true" />
    </traits>
    <performance runid="System.Runtime.Tests.dll-Linux" etl="/home/ianha/alliswell/results/System.Runtime.Tests.dll-Linux.csv">
      <metrics>
        <Duration displayName="Duration" unit="msec" />
      </metrics>
      <iterations>
        <iteration index="0" Duration="130.75" />
        <iteration index="1" Duration="118.53219997882843" />
        <iteration index="2" Duration="120.61070001125336" />
        <iteration index="3" Duration="158.38410007953644" />
        <iteration index="4" Duration="119.36450004577637" />
        <iteration index="5" Duration="119.48799991607666" />
        <iteration index="6" Duration="156.4025000333786" />
        <iteration index="7" Duration="119.11340010166168" />
        <iteration index="8" Duration="119.42309999465942" />
      </iterations>
    </performance>
  </test>
  <test name="System.Runtime.Tests.Perf_String.StartsWith(size: 1000)" type="System.Runtime.Tests.Perf_String" method="StartsWith" time="1.6711299" result="Pass">
    <traits>
      <trait name="Benchmark" value="true" />
    </traits>
    <performance runid="System.Runtime.Tests.dll-Linux" etl="/home/ianha/alliswell/results/System.Runtime.Tests.dll-Linux.csv">
      <metrics>
        <Duration displayName="Duration" unit="msec" />
      </metrics>
      <iterations>
        <iteration index="0" Duration="390.06299996376038" />
        <iteration index="1" Duration="430.92409992218018" />
        <iteration index="2" Duration="446.965499997139" />
        <iteration index="3" Duration="402.86049997806549" />
      </iterations>
    </performance>
  </test>

Windows results:

  <test name="System.Runtime.Tests.Perf_String.StartsWith(size: 10)" type="System.Runtime.Tests.Perf_String" method="StartsWith" time="0.9908253" result="Pass">
    <traits>
      <trait name="Benchmark" value="true" />
    </traits>
    <performance runid="System.Runtime.Tests.dll-WindowsCore" etl="D:\git\corefx\System.Runtime.Tests.dll-WindowsCore.etl">
      <metrics>
        <Duration displayName="Duration" unit="msec" />
      </metrics>
      <iterations>
        <iteration index="0" Duration="1.6199558489330457" />
        <iteration index="1" Duration="0.90406194948366192" />
        <iteration index="2" Duration="0.88182390720692183" />
        <iteration index="3" Duration="0.88381962894970911" />
        <iteration index="4" Duration="0.88410473205590279" />
        <iteration index="5" Duration="0.87982818546402086" />
        <iteration index="6" Duration="0.87982818546413455" />
        <iteration index="7" Duration="0.8821090103131155" />
        <iteration index="8" Duration="0.82195225492330337" />
        <iteration index="9" Duration="0.87526653576628632" />
        <iteration index="10" Duration="0.88353452584362913" />
        <iteration index="11" Duration="0.8789728761457809" />
        <iteration index="12" Duration="0.98018447881577231" />
        <iteration index="13" Duration="0.88467493826806276" />
        <iteration index="14" Duration="0.88324942273743545" />
        <iteration index="15" Duration="0.91860220789544655" />
        <iteration index="16" Duration="0.88496004137414275" />
        <iteration index="17" Duration="0.88239411341919549" />
        <iteration index="18" Duration="0.8838196289498228" />
        <iteration index="19" Duration="0.88467493826794907" />
        <iteration index="20" Duration="0.82622880151484424" />
        <iteration index="21" Duration="0.88182390720692183" />
        <iteration index="22" Duration="0.88296431963146915" />
        <iteration index="23" Duration="0.88182390720692183" />
        <iteration index="24" Duration="0.88210901031300182" />
        <iteration index="25" Duration="0.8821090103131155" />
        <iteration index="26" Duration="0.9077682898631565" />
        <iteration index="27" Duration="0.97647813843639142" />
        <iteration index="28" Duration="0.88809617554147735" />
        <iteration index="29" Duration="0.90605767122656289" />
        <iteration index="30" Duration="0.9513890650985104" />
      </iterations>
    </performance>
  </test>
  <test name="System.Runtime.Tests.Perf_String.StartsWith(size: 100)" type="System.Runtime.Tests.Perf_String" method="StartsWith" time="1.0032305" result="Pass">
    <traits>
      <trait name="Benchmark" value="true" />
    </traits>
    <performance runid="System.Runtime.Tests.dll-WindowsCore" etl="D:\git\corefx\System.Runtime.Tests.dll-WindowsCore.etl">
      <metrics>
        <Duration displayName="Duration" unit="msec" />
      </metrics>
      <iterations>
        <iteration index="0" Duration="1.5221654835363552" />
        <iteration index="1" Duration="1.7348524006963544" />
        <iteration index="2" Duration="1.4999274412593877" />
        <iteration index="3" Duration="1.5070550189122969" />
        <iteration index="4" Duration="1.5888796103665754" />
        <iteration index="5" Duration="1.4748383679216204" />
        <iteration index="6" Duration="1.5310036798257443" />
        <iteration index="7" Duration="1.6396279632547248" />
        <iteration index="8" Duration="1.4831063579988495" />
        <iteration index="9" Duration="1.5318589891439842" />
        <iteration index="10" Duration="1.5144676996710587" />
        <iteration index="11" Duration="1.4622938312527367" />
        <iteration index="12" Duration="1.6533129123481558" />
        <iteration index="13" Duration="1.6139686837045701" />
        <iteration index="14" Duration="1.5763350736976918" />
        <iteration index="15" Duration="1.3781884149493635" />
        <iteration index="16" Duration="1.4058434162423055" />
        <iteration index="17" Duration="1.3750522807820289" />
        <iteration index="18" Duration="1.5201697617933405" />
        <iteration index="19" Duration="1.4041327976055982" />
        <iteration index="20" Duration="1.4397706858696893" />
        <iteration index="21" Duration="1.3770480025248162" />
        <iteration index="22" Duration="1.3904478485121672" />
        <iteration index="23" Duration="1.5241612052789151" />
        <iteration index="24" Duration="1.4688512026932585" />
        <iteration index="25" Duration="1.373911868357709" />
        <iteration index="26" Duration="1.3656438782804798" />
        <iteration index="27" Duration="1.464574656101604" />
        <iteration index="28" Duration="1.4055583131362255" />
        <iteration index="29" Duration="1.4617236250403494" />
        <iteration index="30" Duration="1.3650736720680925" />
      </iterations>
    </performance>
  </test>
  <test name="System.Runtime.Tests.Perf_String.StartsWith(size: 1000)" type="System.Runtime.Tests.Perf_String" method="StartsWith" time="1.0069319" result="Pass">
    <traits>
      <trait name="Benchmark" value="true" />
    </traits>
    <performance runid="System.Runtime.Tests.dll-WindowsCore" etl="D:\git\corefx\System.Runtime.Tests.dll-WindowsCore.etl">
      <metrics>
        <Duration displayName="Duration" unit="msec" />
      </metrics>
      <iterations>
        <iteration index="0" Duration="4.9605089432566274" />
        <iteration index="1" Duration="4.54853495492398" />
        <iteration index="2" Duration="4.6292191339534838" />
        <iteration index="3" Duration="4.8795396611208162" />
        <iteration index="4" Duration="4.7609367689778992" />
        <iteration index="5" Duration="4.5550923263645018" />
        <iteration index="6" Duration="5.1312857038183211" />
        <iteration index="7" Duration="4.7287201179869953" />
        <iteration index="8" Duration="4.77319620254093" />
        <iteration index="9" Duration="4.6009939264486093" />
        <iteration index="10" Duration="5.2370589561865017" />
        <iteration index="11" Duration="4.7204521279099936" />
        <iteration index="12" Duration="4.5254416033285452" />
        <iteration index="13" Duration="5.1697746231434394" />
        <iteration index="14" Duration="4.5932961425837675" />
        <iteration index="15" Duration="4.93599007613102" />
        <iteration index="16" Duration="4.963930180530042" />
        <iteration index="17" Duration="4.7047714570735479" />
        <iteration index="18" Duration="4.9804661606845" />
        <iteration index="19" Duration="4.5223054691614379" />
        <iteration index="20" Duration="5.0158189458425113" />
        <iteration index="21" Duration="4.8698461555127324" />
        <iteration index="22" Duration="4.6468955265327168" />
        <iteration index="23" Duration="4.859297340586636" />
        <iteration index="24" Duration="4.6260829997863766" />
        <iteration index="25" Duration="4.8307870299754541" />
        <iteration index="26" Duration="4.9442580662080218" />
        <iteration index="27" Duration="4.8105447094412739" />
        <iteration index="28" Duration="4.9861682228070094" />
        <iteration index="29" Duration="4.9057691468829034" />
        <iteration index="30" Duration="4.9964319346272532" />
      </iterations>
    </performance>
  </test>

Linux results before Regression: (Unfortunately these are from before I was recording dates with my results, so I'm not sure when they're from or even how valid they are - the important thing is moreso the current Linux results vs Windows)

  <test name="System.Runtime.Tests.Perf_String.StartsWith(size: 10)" type="System.Runtime.Tests.Perf_String" method="StartsWith" time="0.9040166" result="Pass">
    <traits>
      <trait name="Benchmark" value="true" />
    </traits>
    <performance runid="System.Runtime.Tests.dll-Linux" etl="/home/ianha/freshstart/results/System.Runtime.Tests.dll-Linux.csv">
      <metrics>
        <Duration displayName="Duration" unit="msec" />
      </metrics>
      <iterations>
        <iteration index="0" Duration="0.721000075340271" />
        <iteration index="1" Duration="1.0049999952316284" />
        <iteration index="2" Duration="0.72970008850097656" />
        <iteration index="3" Duration="0.992900013923645" />
        <iteration index="4" Duration="0.70289993286132813" />
        <iteration index="5" Duration="0.69099998474121094" />
        <iteration index="6" Duration="0.68729996681213379" />
        <iteration index="7" Duration="0.687999963760376" />
        <iteration index="8" Duration="0.70929992198944092" />
        <iteration index="9" Duration="0.69709992408752441" />
        <iteration index="10" Duration="0.68490004539489746" />
        <iteration index="11" Duration="0.920799970626831" />
        <iteration index="12" Duration="0.68540000915527344" />
        <iteration index="13" Duration="0.84780001640319824" />
        <iteration index="14" Duration="0.68470001220703125" />
        <iteration index="15" Duration="0.680400013923645" />
        <iteration index="16" Duration="0.89419996738433838" />
        <iteration index="17" Duration="1.1850999593734741" />
        <iteration index="18" Duration="0.72119998931884766" />
        <iteration index="19" Duration="0.69780004024505615" />
        <iteration index="20" Duration="0.88760006427764893" />
        <iteration index="21" Duration="0.69430005550384521" />
        <iteration index="22" Duration="0.695099949836731" />
        <iteration index="23" Duration="0.88989996910095215" />
        <iteration index="24" Duration="0.69870007038116455" />
        <iteration index="25" Duration="0.73190009593963623" />
        <iteration index="26" Duration="0.71410000324249268" />
        <iteration index="27" Duration="0.71560001373291016" />
        <iteration index="28" Duration="0.71969997882843018" />
        <iteration index="29" Duration="0.69029998779296875" />
        <iteration index="30" Duration="0.685499906539917" />
      </iterations>
    </performance>
  </test>
  <test name="System.Runtime.Tests.Perf_String.StartsWith(size: 100)" type="System.Runtime.Tests.Perf_String" method="StartsWith" time="1.0027038" result="Pass">
    <traits>
      <trait name="Benchmark" value="true" />
    </traits>
    <performance runid="System.Runtime.Tests.dll-Linux" etl="/home/ianha/freshstart/results/System.Runtime.Tests.dll-Linux.csv">
      <metrics>
        <Duration displayName="Duration" unit="msec" />
      </metrics>
      <iterations>
        <iteration index="0" Duration="1.2748000621795654" />
        <iteration index="1" Duration="1.1951999664306641" />
        <iteration index="2" Duration="1.9184000492095947" />
        <iteration index="3" Duration="1.1891000270843506" />
        <iteration index="4" Duration="1.2057000398635864" />
        <iteration index="5" Duration="1.2546999454498291" />
        <iteration index="6" Duration="1.3502999544143677" />
        <iteration index="7" Duration="3.1394000053405762" />
        <iteration index="8" Duration="1.1751000881195068" />
        <iteration index="9" Duration="1.1885000467300415" />
        <iteration index="10" Duration="2.1635000705718994" />
        <iteration index="11" Duration="1.1863000392913818" />
        <iteration index="12" Duration="1.1722999811172485" />
        <iteration index="13" Duration="1.2384999990463257" />
        <iteration index="14" Duration="1.3604999780654907" />
        <iteration index="15" Duration="1.3181999921798706" />
        <iteration index="16" Duration="1.2493000030517578" />
        <iteration index="17" Duration="1.1564000844955444" />
        <iteration index="18" Duration="3.4636999368667603" />
        <iteration index="19" Duration="1.3828999996185303" />
        <iteration index="20" Duration="1.4925000667572021" />
        <iteration index="21" Duration="1.1969000101089478" />
        <iteration index="22" Duration="1.2124999761581421" />
        <iteration index="23" Duration="1.22760009765625" />
        <iteration index="24" Duration="1.3089998960494995" />
        <iteration index="25" Duration="1.2039999961853027" />
        <iteration index="26" Duration="1.236299991607666" />
        <iteration index="27" Duration="1.1713000535964966" />
        <iteration index="28" Duration="1.4200999736785889" />
        <iteration index="29" Duration="3.6653000116348267" />
        <iteration index="30" Duration="1.274899959564209" />
      </iterations>
    </performance>
  </test>
  <test name="System.Runtime.Tests.Perf_String.StartsWith(size: 1000)" type="System.Runtime.Tests.Perf_String" method="StartsWith" time="1.0115642" result="Pass">
    <traits>
      <trait name="Benchmark" value="true" />
    </traits>
    <performance runid="System.Runtime.Tests.dll-Linux" etl="/home/ianha/freshstart/results/System.Runtime.Tests.dll-Linux.csv">
      <metrics>
        <Duration displayName="Duration" unit="msec" />
      </metrics>
      <iterations>
        <iteration index="0" Duration="5.7690999507904053" />
        <iteration index="1" Duration="5.8193000555038452" />
        <iteration index="2" Duration="6.0430999994277954" />
        <iteration index="3" Duration="7.13699996471405" />
        <iteration index="4" Duration="10.848199963569641" />
        <iteration index="5" Duration="14.575500011444092" />
        <iteration index="6" Duration="13.552000045776367" />
        <iteration index="7" Duration="9.6857999563217163" />
        <iteration index="8" Duration="14.067000031471252" />
        <iteration index="9" Duration="9.4052000045776367" />
        <iteration index="10" Duration="10.529799938201904" />
        <iteration index="11" Duration="5.1091001033782959" />
        <iteration index="12" Duration="10.229700088500977" />
        <iteration index="13" Duration="13.708599925041199" />
        <iteration index="14" Duration="10.900200009346008" />
        <iteration index="15" Duration="9.4916000366210938" />
        <iteration index="16" Duration="15.008299946784973" />
        <iteration index="17" Duration="13.553900003433228" />
        <iteration index="18" Duration="5.5837999582290649" />
        <iteration index="19" Duration="6.8181999921798706" />
        <iteration index="20" Duration="5.291700005531311" />
        <iteration index="21" Duration="6.8190000057220459" />
        <iteration index="22" Duration="5.4120000600814819" />
        <iteration index="23" Duration="5.0850000381469727" />
        <iteration index="24" Duration="7.6000000238418579" />
        <iteration index="25" Duration="5.7700999975204468" />
        <iteration index="26" Duration="5.4170999526977539" />
        <iteration index="27" Duration="7.3667000532150269" />
        <iteration index="28" Duration="6.001599907875061" />
        <iteration index="29" Duration="7.553399920463562" />
        <iteration index="30" Duration="5.1969000101089478" />
      </iterations>
    </performance>
  </test>

I verified the results from today with a Console App and received the same results as with the xunit perf runner.

My first guess is that this has something to do with dotnet/coreclr#1632.

@ellismg
Copy link
Contributor

ellismg commented Oct 6, 2015

I assume this is actually due to the ICU work, not the change you pointed out. Previously we would have just been doing a simple ASCII search. Now we will be loading collators and doing a bunch more work per iteration.

@ellismg ellismg self-assigned this Oct 6, 2015
@stephentoub
Copy link
Member

(For reference, dotnet/corefx#1632 only touched the Ordinal case, whereas this test is using CurrentCulture.)

ellismg referenced this issue in ellismg/corefx Dec 16, 2015
In general lingustic comparsion is slower than ordinal comparision.
When using ICU for globalization, the lingustic StartsWith code path
actually ends up doing a lot of work inside ICU itself. While we'll try
to make things better (see dotnet/corefx#3672) for common cases like
ASCII only, if code doesn't need ligustic comparision the best practice
is to not request it.

BaseNumberConverter was using a lingustic StartsWith to detect a hex
prefix when trying to parse a number but it is better served by just
always using ordinal comparisons.

This results in a ~35% QPS improvement in some very barebones ASP.NET
scenarios (like the one outlined in richardkiene/CoreCLRWebAPISample) on
my local machine.
ellismg referenced this issue in ellismg/coreclr Jan 13, 2016
In ICU doing any sort of index of operation (which includes Prefix and
Suffix checking) is relatively expensive. ICU ends up doing a fair
amount of work and allocations in order to construct a searcher object
which could be reused, but our APIs are not amenable towards working in
this manner.

However, for some cultures we can often fast path ASCII searches when we
know that ASCII and Ordinal comparisions are the same, as is the case
for both Invariant and en-US.

This change has CompareInfo hold some additional state about a locale to
decide if we can do this optimiztion and then wires it up to IndexOf,
LastIndexOf, IsPrefix and IsSuffix.

In the future, we can try to extend the set of allowable cultures that
we preform this optimization on by coming up with better checks on when
it is safe to preform this transformation.

Today, this optimization does not apply when IgnoreSymbols is set,
because we would need to blank some ASCII symbol characters. If this
ends up being a common operation, we could consider having ordinal
implementations that also ignore symbols.

This represents the best that we can do for dotnet/corefx#3672. It gets
us back to where we were before for many common real world cases.

Fixes dotnet/corefx#3672.
ellismg referenced this issue in ellismg/coreclr Jan 13, 2016
In ICU doing any sort of index of operation (which includes Prefix and
Suffix checking) is relatively expensive. ICU ends up doing a fair
amount of work and allocations in order to construct a searcher object
which could be reused, but our APIs are not amenable towards working in
this manner.

However, for some cultures we can often fast path ASCII searches when we
know that ASCII and Ordinal comparisions are the same, as is the case
for both Invariant and en-US.

This change has CompareInfo hold some additional state about a locale to
decide if we can do this optimiztion and then wires it up to IndexOf,
LastIndexOf, IsPrefix and IsSuffix.

In the future, we can try to extend the set of allowable cultures that
we preform this optimization on by coming up with better checks on when
it is safe to preform this transformation.

Today, this optimization does not apply when IgnoreSymbols is set,
because we would need to blank some ASCII symbol characters. If this
ends up being a common operation, we could consider having ordinal
implementations that also ignore symbols.

This represents the best that we can do for dotnet/corefx#3672. It gets
us back to where we were before for many common real world cases.

Fixes dotnet/corefx#3672.
@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 1.0.0-rc2 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Jan 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Runtime os-linux Linux OS (any supported distro) tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants