You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Decrease writes to local variables in Buffer.MemoryCopy (dotnet/coreclr#6627)
In `Buffer.MemoryCopy` currently we are making 4 writes every time we copy some data; 1 to update `*dest`, 1 to update `dest`, 1 to update `src` and 1 to update `len`. I've decreased it to 2; one to update a new local variable `i`, which keeps track of how many bytes we are into the buffer. All writes are now made using
```cs
*(dest + i + x) = *(src + i + x)
```
which has no additional overhead since they're converted to using memory addressing operands by the jit.
Another change I made was to add a few extra cases for the switch-case at the beginning that does copying for small sizes without any branches. It now covers sizes 0-22. This is beneficial to the main codepath, since we can convert the unrolled loop to a `do..while` loop and save an extra branch at the beginning. (max 7 bytes for alignment, 16 for 1 iteration of the loop, so the min bytes we can copy without checking whether we should stop is 23.) This adds
This PR increases the performance of `MemoryCopy` by 10-20% for most buffer sizes on x86; you can see the performance test/results (and the generated assembly for each version) [here](https://gist.github.com/jamesqo/337852c8ce09205a8289ce1f1b9b5382). (Note that this codepath is also used by `wstrcpy` at the moment, so this directly affects many common String operations.)
Commit migrated from dotnet/coreclr@32fe063
// P/Invoke into the native version when the buffers are overlapping and the copy needs to be performed backwards
274
276
// This check can produce false positives for lengths greater than Int32.MaxInt. It is fine because we want to use PInvoke path for the large lengths anyway.
275
-
#if BIT64
276
-
if((ulong)dest-(ulong)src<len)gotoPInvoke;
277
-
#else
278
-
if(((uint)dest-(uint)src)<len)gotoPInvoke;
279
-
#endif
280
-
//
277
+
278
+
if((nuint)dest-(nuint)src<len)gotoPInvoke;
279
+
281
280
// This is portable version of memcpy. It mirrors what the hand optimized assembly versions of memcpy typically do.
282
281
//
283
282
// Ideally, we would just use the cpblk IL instruction here. Unfortunately, cpblk IL instruction is not as efficient as
284
283
// possible yet and so we have this implementation here for now.
285
-
//
284
+
285
+
// Note: It's important that this switch handles lengths at least up to 22.
286
+
// See notes below near the main loop for why.
287
+
288
+
// The switch will be very fast since it can be implemented using a jump
289
+
// table in assembly. See http://stackoverflow.com/a/449297/4077294 for more info.
0 commit comments