Description
I compiled openblas for Android to be used by caffe. I am calling openblas_set_num_threads before loading a caffe model and experimented with various values of num_threads and I don't see any difference in run time. I then printed the values of openblas_get_num_threads
and openblas_get_num_procs()
and it always returns 8 (max number of cores available on my Android device). I confirmed that I have the fix for #762 by using the develop branch instead of deep_learning.
I compiled openblas with the following configuration:
NO_LAPACK=1 TARGET=ARMV7 USE_THREAD=1 NUM_THREADS=16 USE_OPENMP=1
Another question I have is how many number of threads openblas uses by default if openblas_set_num_threads
is not specified and nor is the environment variable OMP_NUM_THREADS
.
How can I experiment with setting num of threads with openblas as the above method isn't working for me. Am I not using the right APIs?