Skip to content

Optimize string.EndsWith(char) for const values #69038

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 20, 2022

Conversation

GrabYourPitchforks
Copy link
Member

From looking at source.dot.net, all usages of string.EndsWith(char) throughout our libraries pass const values for the char parameter.

Just like the string.StartsWith(char) optimization that was done in #63734, we can take advantage of the layout of string and reduce the number of branches and total codegen size for string.EndsWith(char).

I've also put some comments in both StartsWith and EndsWith explaining how the optimizations work and what assumptions they make, including that the code shouldn't be copied & pasted to other data types like char[] or ROS<char>.

static void DoIt4(string s)
{
    if (s.EndsWith('x')) { Console.WriteLine(s); }
}
;
; BEFORE
;

; Method ConsoleApp7.Program:DoIt4(System.String)
G_M63046_IG01:
						;; size=0 bbWeight=1    PerfScore 0.00

G_M63046_IG02:
       mov      eax, dword ptr [rcx+8]
       lea      edx, [rax-1]
       cmp      eax, edx
       jbe      SHORT G_M63046_IG05
						;; size=10 bbWeight=1    PerfScore 3.75

G_M63046_IG03:
       mov      eax, edx
       cmp      word  ptr [rcx+2*rax+12], 120
       jne      SHORT G_M63046_IG05
						;; size=10 bbWeight=0.50 PerfScore 2.12

G_M63046_IG04:
       tail.jmp [System.Console:WriteLine(System.String)]
						;; size=6 bbWeight=0.50 PerfScore 1.00

G_M63046_IG05:
       ret      
						;; size=1 bbWeight=0.50 PerfScore 0.50
; Total bytes of code: 27

;
; AFTER
;

; Method ConsoleApp7.Program:DoIt4(System.String)
G_M63046_IG01:
						;; size=0 bbWeight=1    PerfScore 0.00

G_M63046_IG02:
       mov      eax, dword ptr [rcx+8]
       cmp      word  ptr [rcx+2*rax+10], 120
       jne      SHORT G_M63046_IG04
						;; size=11 bbWeight=1    PerfScore 6.00

G_M63046_IG03:
       tail.jmp [System.Console:WriteLine(System.String)]
						;; size=6 bbWeight=0.50 PerfScore 1.00

G_M63046_IG04:
       ret      
						;; size=1 bbWeight=0.50 PerfScore 0.50
; Total bytes of code: 18

@ghost
Copy link

ghost commented May 8, 2022

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

From looking at source.dot.net, all usages of string.EndsWith(char) throughout our libraries pass const values for the char parameter.

Just like the string.StartsWith(char) optimization that was done in #63734, we can take advantage of the layout of string and reduce the number of branches and total codegen size for string.EndsWith(char).

I've also put some comments in both StartsWith and EndsWith explaining how the optimizations work and what assumptions they make, including that the code shouldn't be copied & pasted to other data types like char[] or ROS<char>.

static void DoIt4(string s)
{
    if (s.EndsWith('x')) { Console.WriteLine(s); }
}
;
; BEFORE
;

; Method ConsoleApp7.Program:DoIt4(System.String)
G_M63046_IG01:
						;; size=0 bbWeight=1    PerfScore 0.00

G_M63046_IG02:
       mov      eax, dword ptr [rcx+8]
       lea      edx, [rax-1]
       cmp      eax, edx
       jbe      SHORT G_M63046_IG05
						;; size=10 bbWeight=1    PerfScore 3.75

G_M63046_IG03:
       mov      eax, edx
       cmp      word  ptr [rcx+2*rax+12], 120
       jne      SHORT G_M63046_IG05
						;; size=10 bbWeight=0.50 PerfScore 2.12

G_M63046_IG04:
       tail.jmp [System.Console:WriteLine(System.String)]
						;; size=6 bbWeight=0.50 PerfScore 1.00

G_M63046_IG05:
       ret      
						;; size=1 bbWeight=0.50 PerfScore 0.50
; Total bytes of code: 27

;
; AFTER
;

; Method ConsoleApp7.Program:DoIt4(System.String)
G_M63046_IG01:
						;; size=0 bbWeight=1    PerfScore 0.00

G_M63046_IG02:
       mov      eax, dword ptr [rcx+8]
       cmp      word  ptr [rcx+2*rax+10], 120
       jne      SHORT G_M63046_IG04
						;; size=11 bbWeight=1    PerfScore 6.00

G_M63046_IG03:
       tail.jmp [System.Console:WriteLine(System.String)]
						;; size=6 bbWeight=0.50 PerfScore 1.00

G_M63046_IG04:
       ret      
						;; size=1 bbWeight=0.50 PerfScore 0.50
; Total bytes of code: 18
Author: GrabYourPitchforks
Assignees: -
Labels:

area-System.Runtime, tenet-performance

Milestone: -

@stephentoub
Copy link
Member

If this makes it more efficient than the equivalent open-coded checks, are there places not using EndsWith that should be changed to use it?

@GrabYourPitchforks
Copy link
Member Author

If this makes it more efficient than the equivalent open-coded checks, are there places not using EndsWith that should be changed to use it?

We did a pass a while back to identify & rewrite calls from EndsWith("single-char", Ordinal) to EndsWith(char) throughout the libs. I think we even have an analyzer for it? But you're right - I should do another pass as part of this PR to see if we missed any call sites. One difficulty is that we can't do the rewrite for libs which target netstandard, since EndsWith(char) doesn't exist in netfx.

@stephentoub
Copy link
Member

stephentoub commented May 9, 2022

rewrite calls from EndsWith("single-char", Ordinal) to EndsWith(char)

I actually meant code manually doing e.g.

if (path.Length == 0 || path[^1] != '/')... 

rather than already using EndsWith.

@stephentoub
Copy link
Member

e.g.

bool fileNameIsQuoted = fileName.Length > 0 && fileName[0] == '\"' && fileName[fileName.Length - 1] == '\"';

if (pipeName.IndexOfAny(s_invalidPathNameChars) >= 0 || pipeName[pipeName.Length - 1] == Path.DirectorySeparatorChar)

string pathSlash = path[path.Length - 1] == '/' ? path : path + "/";

return
value.Length > 1 &&
value[0] == '"' &&
value[value.Length - 1] == '"';

if (charset.Length > 2 &&
charset[0] == '\"' &&
charset[charset.Length - 1] == '\"')

if (result.Length > 0 && result[result.Length - 1] == '.')


etc.

@danmoseley
Copy link
Member

danmoseley commented May 9, 2022

Something like

if (charset.Length > 2 &&
charset[0] == '\"' &&
charset[charset.Length - 1] == '\"')
would be replaced with

 if (charset.Length > 2 && 
     charset.StartsWith('\"') && 
     charset.EndsWith('\"')) 

which would effectively become

charset.Length > 2 && charset._firstChar == '\"' && Unsafe.Add(ref charset._firstChar, (nint)(uint)charset.Length - 1) == '\"';

and we'd expect that to be faster? (Because the JIT can't infer that the length check protects the subsequent accesses?)

@GrabYourPitchforks
Copy link
Member Author

Oh, I understand now. Thanks both for the clarification! :)

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented May 9, 2022

and we'd expect that to be faster? (Because the JIT can't infer that the length check protects the subsequent accesses?)

You're correct that at the moment the length check doesn't protect "complex" accesses like s[s.Length - 1]. But even so, I wouldn't posit this change as a true perf (throughput) win. Bounds check failures are already in predicted-not-taken branches. I'd instead posit this as codegen improvement, which might have second-order effects like causing more methods to be candidates for inlining or making it slightly more likely for a method's codegen to fit into fewer cache lines. And to me, EndsWith('\\') reads a bit better than s[s.Length - 1] == '\\', but that's really a personal preference.

@danmoseley
Copy link
Member

Yup, I can see how EndsWith('\"') might be easier for JIT to recognize than str.Length > 1 && str[str.Length - 1] == '\"'.

Also it does clean up other cases more, eg

fileName.Length > 0 && fileName[0] == '\"' && fileName[fileName.Length - 1] == '\"'; 

becomes

fileName.StartsWith('\"') && fileName.EndsWith('\"');

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented May 9, 2022

Some of those call sites target ns2.0 so can't be updated.

Additionally, some of the call sites target ROS<char>, and we don't have an EndsWith(T) extension method on that type at the moment.

I could add a somewhat optimized EndsWith(T) on MemoryExtensions, but to get full optimization would require some JIT work (see #69080). We could also perhaps get the JIT to recognize the span[span.Length - 1] pattern, but it would require (presumably larger) JIT work. This isn't really my area so I'm going by my gut on what the relative costs of these two options are.

It's probably worth doing the StartsWith(T) and EndsWith(T) extensions anyway for parity with other types. I'll file an issue.

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented May 10, 2022

Latest iteration has a merge from main + scours the runtime project for locations where we can update call sites to take advantage of these StartsWith and EndsWith optimizations.

A few notes for reviewers:

  • These changes should not change the method logic. Some of the methods contained bugs; e.g., Process trying to detect leading and trailing quote characters but not handling the case where the input string is exactly "\"". I did not try to address any such bugs in the PR.

  • There are a small handful of places where I moved some conditional checks around to make them slightly more efficient. But again, no changes to the app logic.

  • I didn't do a repo-wide substitution for str.StartsWith(x) in place of str[0] == x. I only made these changes where there was an opportunity to roll the length check and the char value check together into a single operation, or for consistency where the sibling method EndsWith was being called, or if the call site really needed the improved readability.

  • Call sites which involve ROS<char> or which target ns2.0 are left unchanged.

And if in doubt, remember the logic table!

 string.StartsWith(char) => string.Length > 0 && string[0] == char
!string.StartsWith(char) => string.Length == 0 || string[0] != char
 string.EndsWith(char)   => string.Length > 0 && string[string.Length - 1] == char
!string.EndsWith(char)   => string.Length == 0 || string[string.Length - 1] != char

@bartonjs
Copy link
Member

bartonjs commented May 18, 2022

Looks like most all of my feedback is about using the logical or operator with newlines.

There's also the consistency (particularly a local consistency) point about removing (or not) Length checks when there's also (&&) a StartsWith check.

@GrabYourPitchforks
Copy link
Member Author

Looks like most of my feedback is about using the logical or operator with newlines.

Drat, my desire to force my preferred coding conventions on the world is foiled again. :)

@GrabYourPitchforks GrabYourPitchforks merged commit eeeeb7a into dotnet:main May 20, 2022
@GrabYourPitchforks GrabYourPitchforks deleted the endswith branch May 20, 2022 05:21
@ghost ghost locked as resolved and limited conversation to collaborators Jun 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants