Significant performance increase for gemm, but not uniform (v0.30.0 develop)

Hello,

We see that the for single threaded sgemm, the code from the develop branch (from early November 2017) shows a significant performance increase for most values for M, N and K.

Our development environment is Windows, with clang, on a haswell laptop, but we see the same thing on linux. 

See the attached graphs.
The results were obtained by timing a single call to openBLAS, with randomized data, which explains why for small matrices, the actual throughput is low, since most of the time is lost waiting for the cache to be loaded. 

We also see that for some sizes of N, M and K, there is a performance decrease compared to v0.20.0.
We are therefore reluctant to move to this version.

The spikes in the graphs for sgemv seems to indicate that the performance of gemv could be further improved. 

I've read the Goto paper and I've tried to tune this by forcing a number of parameters in config.h via cpuid_x86.c, but to no avail. 

Is there something I can try?
Do you expect a more uniform performance increase before this code is released in v0.30.0 ?

Thank you,

![performancev30](https://user-images.githubusercontent.com/2297967/32895934-ebccb808-cae1-11e7-9567-e1141556db67.png)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Significant performance increase for gemm, but not uniform (v0.30.0 develop) #1360

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Significant performance increase for gemm, but not uniform (v0.30.0 develop) #1360

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions