Initial commit benchmark2 (.not core) #2

Anderman · 2016-08-22T23:38:37Z

Simple benchmark. Allow faster testing and easy to generate excel sheet.
randomize/cached testing is still manual

nietras · 2016-08-28T13:16:10Z

@Anderman wow there are some cool tricks in here. I will probably wait merging until I have some time to do refactoring and understand your changes. Will you give some new results on the latest changes you have done? Perhaps in the original clr thread?

Will you be doing more work on this?

Anderman · 2016-08-28T20:58:51Z

@nietras.

Will you be doing more work on this?
Not this week. I think of a test with two threads on different cores. One writing / one reading
I think there maybe some problems in Kestrel if the wrong cache strategy is chosen.

nietras · 2016-08-29T07:33:07Z

@Anderman I gave this a quick look. The tricks to call assembly code inline in C# are very impressive. I do have one concern and that is the use of RDTSC, from Acquiring high-resolution time stamps, which I highly recommend reading:

We strongly discourage using the RDTSC or RDTSCP processor instruction to directly query the TSC because you won't get reliable results on some versions of Windows, across live migrations of virtual machines, and on hardware systems without invariant or tightly synchronized TSCs.

One way to avoid some issues with RDTSC is to set processor affinity, for example. But it still has issues.

Anderman · 2016-08-29T19:50:37Z

@nietras Could be a problem. But I don't see why this depends on the version of windows. There is another way to get the same counters. http://stackoverflow.com/questions/26618991/measure-cpu-cycles-of-a-function-call/26619409#26619409

nietras · 2016-08-29T20:14:21Z

this is the same as RDTSC see
https://blogs.msdn.microsoft.com/oldnewthing/20160429-00/?p=93385 with the
same issues.

On Aug 29, 2016 21:50, "Thom Kiesewetter" [email protected] wrote:

@nietras https://github.com/nietras Could be a problem. But I don't see
why this depends on the version of windows. There is another way to get the
same counters. http://stackoverflow.com/questions/26618991/measure-
cpu-cycles-of-a-function-call/26619409#26619409

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKTG79QY1G4NWm9Um28I-nkTd-Ijki5Eks5qkzgOgaJpZM4Jqadf
.

Anderman · 2016-08-29T21:04:23Z

Ok. I know that. Thats why some people say use the clock counter instead of nano sec. I found ä document from a prof, he did al lot of performance cpu testen for many years. Thats also why warmup test are needed.
This prof also say
Find the min duration. Because you want to test your code and not the influence of the system
Thats also why the test must be relative short.

I do 50 cycles to minimize the overhead and each test is 100ms. So there will be a lot of test and some are not influenced by the system

nietras · 2016-08-30T18:03:13Z

I found ä document from a prof, he did al lot of performance cpu testen for many years.

Are you referring to Agner Fog (http://www.agner.org/optimize/)?

Find the min duration. Because you want to test your code and not the influence of the system
Thats also why the test must be relative short.

Yes but his use cases are different, as far as I am concerned, he is profiling short native code. Not a managed runtime. Additionally, he can call RDTSC with very little overhead, this is not possible in .NET. The overhead or latency is "high". This is the reason why HPET isn't default in Windows too, the latency for this is pretty high compared to the "clock" frequency of the timer. Latencies are covered in the link I sent.

I do 50 cycles to minimize the overhead and each test is 100ms.

Does that mean that each measurement spans ~100ms or? Perhaps I should ask differently why did you want to change to using RDTSC from the Stopwatch? If you are measuring over 100ms there is no reason, so I am guessing that is not the case.

I am not saying that RDTSC can't be used for perf measuring it just has some drawbacks, and measuring perf in .NET versus native code is different. .NET has a JIT, GC etc. all influence how code is running.

You probably already have seen what BenchmarkDotNet does when measuring but you can see it here https://perfdotnet.github.io/BenchmarkDotNet/HowItWorks.htm or at least a high overview. I do agree, though, that they have taken it to extremes for simple code measurements when the default period is 200ms per measurement, this is why I am working on something a little more quick... too little time, though.

In any case, I think your approach with specific benchmarks for memory copying makes a lot of sense.

Anderman · 2016-08-30T20:25:22Z

Are you referring to Agner Fog (http://www.agner.org/optimize/)?
Yes, I tried to find the document that say something about testing. But I could not find it anymore. );

Yes but his use cases are different, as far as I am concerned, he is profiling short native code. Not a managed runtime. Additionally, he can call RDTSC with very little overhead, this is not possible in .NET. The overhead or latency is "high". This is the reason why HPET isn't default in Windows too, the latency for this is pretty high compared to the "clock" frequency of the timer. Latencies are covered in the link I sent.

I don't see what managed runtime has todo with our memory tests. Qcalls and PInvoke has some overhead but it is only a number of extra instructions.
I use the following trick to see what assembly code is called. I add Debugger.Launch() statement between two calls

result = Foo();
Debugger.Launch();
result= Foo();

Run with F5 release build. When the exception is show. Open with VS and do 4 times step out (Shift F11)
Now you are in the assembly code which you want to debug. With this you can also step into native calls

HPET. hmm, Interesting. It looks that are better ways to measure.

Does that mean that each measurement spans ~100ms or? Perhaps I should ask differently why did you want to change to using RDTSC from the Stopwatch? If you are measuring over 100ms there is no reason, so I am guessing that is not the case.

The tick counter on the stopwatch is 100-200 slower then the RDTSC counter. I didn't get repeatable result when I used the stopwatch. The test must be longer because counter is running at a slower speed. But then other code that's runs on my computer has the change to influence the results.
example
anderman.memove(count=0) takes 8 clockcycles
Iterations 50

Total clockCycles code = 8*50 = 400
Total overhead (loop + 2 times call RPTSC) 200 clock cycles.

So the clock timer will measure 600 clockcycles. With the stopwatch I get only 2 or 3 ticks

So in 100ms the test will run about 100ms/600cycles is about 100ms/300ns = 333333 times.
The test will show the fastest result of all 333333 tests. because this was the test that wasn't disturbed by other code that was running on my system.

I hope this make thinks more clear why I took this solution

Anderman · 2016-08-30T20:46:26Z

Maybe Interesting

nietras · 2016-08-31T07:09:27Z

Yes, it has an optimized copy function mentioned in one of the reports. Could be used for inspiration, although it uses aligned instructions etc that are not available in. net.

nietras · 2016-09-10T17:53:42Z

@Anderman there is a great (and long) blog post about timers and e.g. all the issues with TSC here: http://aakinshin.net/en/blog/dotnet/stopwatch/

Including latency and resolution issues.

Anderman added 6 commits August 23, 2016 01:23

Initial Commit Benchmark2

acfdada

typos

f2ebca8

Add test constant and calculations

0113786

Fix execption with fast test<1ms

eda77fd

Add cpu count and Google graph and faster test results

a8e4bdf

Remove old tests

ca2329c

Cleanup

3c0bf81

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initial commit benchmark2 (.not core) #2

Initial commit benchmark2 (.not core) #2

Uh oh!

Anderman commented Aug 22, 2016

Uh oh!

nietras commented Aug 28, 2016

Uh oh!

Anderman commented Aug 28, 2016

Uh oh!

nietras commented Aug 29, 2016

Uh oh!

Anderman commented Aug 29, 2016

Uh oh!

nietras commented Aug 29, 2016

Uh oh!

Anderman commented Aug 29, 2016

Uh oh!

nietras commented Aug 30, 2016

Uh oh!

Anderman commented Aug 30, 2016 •

edited

Loading

Uh oh!

Anderman commented Aug 30, 2016 •

edited

Loading

Uh oh!

nietras commented Aug 31, 2016

Uh oh!

nietras commented Sep 10, 2016

Uh oh!

Uh oh!

Initial commit benchmark2 (.not core) #2

Are you sure you want to change the base?

Initial commit benchmark2 (.not core) #2

Uh oh!

Conversation

Anderman commented Aug 22, 2016

Uh oh!

nietras commented Aug 28, 2016

Uh oh!

Anderman commented Aug 28, 2016

Uh oh!

nietras commented Aug 29, 2016

Uh oh!

Anderman commented Aug 29, 2016

Uh oh!

nietras commented Aug 29, 2016

Uh oh!

Anderman commented Aug 29, 2016

Uh oh!

nietras commented Aug 30, 2016

Uh oh!

Anderman commented Aug 30, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Anderman commented Aug 30, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nietras commented Aug 31, 2016

Uh oh!

nietras commented Sep 10, 2016

Uh oh!

Uh oh!

Anderman commented Aug 30, 2016 •

edited

Loading

Anderman commented Aug 30, 2016 •

edited

Loading