-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Fast path IndexOf and variants for ASCII #2630
Conversation
This is not a complete solution to dotnet/corefx#3672 but I believe it's the best we can do given how ICU works. |
From the benchmarks: 10 CharactersOld:
New:
100 CharactersOld:
New:
1000 CharactersOld:
New:
|
@dotnet-bot test CentOS7.1 x64 Release Build please (timeout cloning github) |
[SecuritySafeCritical] | ||
internal CompareInfo(CultureInfo culture) | ||
{ | ||
m_name = culture.m_name; | ||
m_sortName = culture.SortName; | ||
m_sortHandle = Interop.GlobalizationInterop.GetSortHandle(System.Text.Encoding.UTF8.GetBytes(m_sortName)); | ||
m_isAsciiEqualityOrdinal = (m_sortName == "en-US" || m_sortName == ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it just en-US that can do this or all "en" locales?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably do this for all en locales (there are others as well). I will land this and think about ways to extend it (I suspect we could look at the rules for a collator and understand if it's safe to preform this optimization)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
LGTM |
In ICU doing any sort of index of operation (which includes Prefix and Suffix checking) is relatively expensive. ICU ends up doing a fair amount of work and allocations in order to construct a searcher object which could be reused, but our APIs are not amenable towards working in this manner. However, for some cultures we can often fast path ASCII searches when we know that ASCII and Ordinal comparisions are the same, as is the case for both Invariant and en-US. This change has CompareInfo hold some additional state about a locale to decide if we can do this optimiztion and then wires it up to IndexOf, LastIndexOf, IsPrefix and IsSuffix. In the future, we can try to extend the set of allowable cultures that we preform this optimization on by coming up with better checks on when it is safe to preform this transformation. Today, this optimization does not apply when IgnoreSymbols is set, because we would need to blank some ASCII symbol characters. If this ends up being a common operation, we could consider having ordinal implementations that also ignore symbols. This represents the best that we can do for dotnet/corefx#3672. It gets us back to where we were before for many common real world cases. Fixes dotnet/corefx#3672.
9a46441
to
8c82dd5
Compare
LGTM 2 |
Test Ubuntu x64 Release Build and Test please (GitHub timeout) |
Test Ubuntu x64 Release Build and Test please |
Fast path IndexOf and variants for ASCII
In ICU doing any sort of index of operation (which includes Prefix and
Suffix checking) is relatively expensive. ICU ends up doing a fair
amount of work and allocations in order to construct a searcher object
which could be reused, but our APIs are not amenable towards working in
this manner.
However, for some cultures we can often fast path ASCII searches when we
know that ASCII and Ordinal comparisions are the same, as is the case
for both Invariant and en-US.
This change has CompareInfo hold some additional state about a locale to
decide if we can do this optimiztion and then wires it up to IndexOf,
LastIndexOf, IsPrefix and IsSuffix.
In the future, we can try to extend the set of allowable cultures that
we preform this optimization on by coming up with better checks on when
it is safe to preform this transformation.
Today, this optimization does not apply when IgnoreSymbols is set,
because we would need to blank some ASCII symbol characters. If this
ends up being a common operation, we could consider having ordinal
implementations that also ignore symbols.
This represents the best that we can do for dotnet/corefx#3672. It gets
us back to where we were before for many common real world cases.
Fixes dotnet/corefx#3672.