Description
Hi!
I understand that OpenBLAS tries to automatically detect the number of CPU cores on a machine at runtime, to determine the number of threads to start when {OPENBLAS,GOTO,OMP,_NUM_THREADS}
is not set.
It works fine in most cases, but when the process runs in a cgroup context, for instance one where the cpuset
subsystem is in use, it may result in less-than-optimal behavior.
For instance, on a 16-core machine, if a process runs inside a cgroup where 4 CPUs have been allocated via cpuset
, OpenBLAS will start 16 threads, which will be pinned on just 4 CPU-cores and which will compete with each other. In the end, the performance will be about 1/4th of what it would have been by just starting 4 threads.
So I'm wondering if any thought has been given about this already, and how OpenBLAS could try to detect if it's running in a constrained context, in order to properly allocate the resources it can use.
Thanks!