Skip to content

Potential optimizations for System.IO.Path #18223

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jamesqo opened this issue Aug 18, 2016 · 7 comments
Closed

Potential optimizations for System.IO.Path #18223

jamesqo opened this issue Aug 18, 2016 · 7 comments
Assignees
Labels
area-System.IO enhancement Product code improvement that does NOT require public API changes/additions
Milestone

Comments

@jamesqo
Copy link
Contributor

jamesqo commented Aug 18, 2016

internal static void CheckInvalidPathChars(string path)
{
    if (path == null)
        ThrowArgumentNullPath();

    if (PathInternal.HasIllegalCharacters(path))
        ThrowArgumentInvalidPathChars();
}
  • StringBuilder's indexer is not inlined, so this is very slow, although it does avoid allocations. It may be worth seeing how all of the method calls stack up against using ToString and indexing into that instead.
  • This is definitely suboptimal; I don't think StringBuilder.Append(char) is inlined, and we could use a char[] instead since we know the max length of the buffer in advance.
  • Same here
  • Same here
    • Maybe this could make use of ArrayPool<char>.Shared.Rent, instead of using StringBuilder?
  • Same here

cc @JeremyKuhne

@JeremyKuhne
Copy link
Member

CheckInvalidPathChars is called as much as 1-3 times in most Path chars,

Personally I'd like to remove the path char check- perhaps under an AppContext switch. It isn't that useful- on Unix it's just "no nulls". Additionally it isn't technically correct for all Windows paths (notably pipes)- I skip it entirely with device paths because of this.

You may want to consider looking at how StringBuffer performs compared to StringBuilder since were already using it and it's cached.

@jamesqo
Copy link
Contributor Author

jamesqo commented Aug 19, 2016

@JeremyKuhne

Personally I'd like to remove the path char check- perhaps under an AppContext switch.

Regarding AppContext itself, what would be the overhead of using that? AFAICS it does locking which could be relatively expensive.

You may want to consider looking at how StringBuffer performs compared to StringBuilder since were already using it and it's cached.

Seems like an interesting option. I'm going to hazard a guess that simply having a new char[] (even uncached) may be faster, since acquiring unmanaged memory (which appears to be what StringBuffer uses) is slow, and the cache uses Interlocked.Exchange which is also slow. I'll have to benchmark before I can say anything for sure though...

I think the best strategy would be to do something like stackalloc char[...] for small buffers, and then ArrayPool<char>.Shared.Rent for larger ones where the overhead of processing the array will be much larger than that of renting it.

@jamesqo
Copy link
Contributor Author

jamesqo commented Aug 31, 2016

@JeremyKuhne I found something unexpected after running 1m iterations of Path.Combine through PerfView... turns out the allocations/most of the things discussed in this thread were nowhere even close to being the bottleneck of the program. Instead, this IndexOfAny call, which is called by CheckInvalidPathChars was taking up almost 60% of CPU time.

Proof:

unexpected_results

The main bottleneck in there appears to be this method, which initializes a bitmap of some sort to indicate whether a character is in the array. It grows in complexity for each char in the array, so due to the huge amount of invalid chars in Windows it's taking up a lot of time.

@JeremyKuhne
Copy link
Member

Regarding AppContext itself, what would be the overhead of using that? AFAICS it does locking which could be relatively expensive.

@jamesqo The desktop implementation uses Interlocked.Exchange. @AlexGhiondea, any plans to optimize AppContext?

Most of the characters are < (char)32. All of which are pretty unlikely. Outside of that you only have 4 chars. Walking through the string manually would likely give better results.

@AlexGhiondea
Copy link
Contributor

AppContext does lock, but it should only take the lock once per app domain per switch. Once the value of the switch is retrieved, it is cached and no further locks are taken for retrieving the value.

@JeremyKuhne did you have an optimization in mind?

@danmoseley
Copy link
Member

@JeremyKuhne I moved this to Future as it does not seem necessary for 2.0

@stephentoub
Copy link
Member

Much of this implementation has changed, e.g. IndexOfAny has changed its implementation completely, Path has been optimized in a variety of ways, etc. To do anything here would require a complete re-investigation to see what the bottlenecks are and whether they actually matter. As such, I'm closing this as inactionable. Please feel free to open a new issue if other real problems are found. Thanks.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 29, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO enhancement Product code improvement that does NOT require public API changes/additions
Projects
None yet
Development

No branches or pull requests

6 participants