Skip to content

Commit a399d004 breaks proper functioning #2155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
davidavdav opened this issue Jun 6, 2019 · 14 comments
Closed

Commit a399d004 breaks proper functioning #2155

davidavdav opened this issue Jun 6, 2019 · 14 comments

Comments

@davidavdav
Copy link

In a setup with Kaldi (automatic speech recognition) with a pykaldi wrapper I've run into random errors related to OpenBLAS. By bisection of the commit history I found that this regression was introduced in a399d00.

On the Kaldi mailing list there are some more details. Because of the complex setup, it is hard to create an MWE, also because of the random nature of the occurrence of the errors.

The errors and the circumstances under which the errors occur appear to have these ingredients/characterics:

  • during multithreaded calculations (for us: initialization of i-vector extraction constants)
  • under heavy machine load, i.e., multiple processes running OpenBLAS-linked tasks
  • the errors are probably of the type "computations give the wrong answer", but I've not experienced NaNs in results
  • The errors occur randomly (perhaps because of array-task processing for which the errors occur), with a reasonable probability (~ 1% of the cases in my situation)

I will try to make these circumstances more explicit with some more experiments, e.g., trying non-concurrent tasks or single thread jobs.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Jun 6, 2019

Which version of OpenBLAS are you using ? The commit you flagged was during what we now know to be a phase where the thread memory management code was rather fragile, and that version of the code should not even be used by default now (you need to compile with USE_TLS=1 if you want it). (#1742 essentially reverted the default memory.c to its 0.3.0 form (a8002e2) and moved its TLS-based rewrite to the "else" branch of an ifdef)

@davidavdav
Copy link
Author

The errors appeared when using "latest" last week, but then I checked out tagged releases: v0.3.0 was correct, but all releases v0.3.1 through v0.3.6 show the same error behaviour.

After this I simply bisected to find the final commit that introduced the erroneous behavior. I've tried several times, but the commit before a399d00 gives no problems, and this one does, and from what I've seen all commits after keep the errors.

In the Makefile.rule USE_TLS=1 is commented out in later versions, so does this imply that commit that the code introduced in a399d00 isn't even used? I just checked again for v0.3.6, with commented out # USE_TLS=1, and I run into the same problems, and I even had the even rarer error this run:

BLAS : Bad memory unallocation! :   50  0x7fcb0847c000

I've seen this before, it happens in about 10% of the test runs I do.

@martin-frbg
Copy link
Collaborator

Yes, the code from a399d00 should not even get compiled. Most likely the problem comes from one of the few other changes to the "traditional" version of memory.c - probably some locking code removal or similar (over)optimization that (re)introduced races. Are you running it with OpenMP or without ?

@brada4
Copy link
Contributor

brada4 commented Jun 6, 2019

There is a magic number 50 of memory areas to be allowed.

@martin-frbg
Copy link
Collaborator

There is a magic number 50 of memory areas to be allowed.

... tied to the maximum number of threads that OpenBLAS was built for, or (in common.h)
#define NUM_BUFFERS MAX(50,(MAX_CPU_NUMBER * 2 * MAX_PARALLEL_NUMBER))
(absent a build-time setting of NUM_THREADS, the default limit is based on the number of CPUs in the build host)
This is probably not the only source of problems though, as 0.3.0 or earlier would simply have hit the
limit sooner - brada4' PR to always provide space for at least 50 (pointers to) memory areas was merged in 0.3.4

@brada4
Copy link
Contributor

brada4 commented Jun 6, 2019

I am thinking of one-off that in supposed now default case the next cell is reached...

@davidavdav
Copy link
Author

@martin-frbg I am running without OpenMP, as far as I can tell.

@brada4 Dan Povey @danpovey found a this max number of threads line, too, on the Kaldi mailing list.

In my test I run on a 6-core machine (each core having 2 threads), 6 concurrent jobs. Dan says that a job uses "multiple threads on startup", I don't know if these are threads within kaldi or blasthreads, I suspect the former. I am compiling OpenBLAS with USE_THREAD=0 because in deployment I need to run many instances of the application on a single machine.

@danpovey
Copy link

danpovey commented Jun 7, 2019

@davidavdav, by default those programs will use 8 threads for a start-up job. (IIRC).
This may break some kind of assumption that OpenBLAS has, or require different configuration. I'm not sure exactly which configuration value is relevant though.

@martin-frbg
Copy link
Collaborator

So you are calling a single-threaded OpenBLAS from several threads in parallel ? This scenario looks a lot like #2126 - if you are on (almost) current develop already you could try setting USE_LOCKING=1.

@davidavdav
Copy link
Author

Kaldi's tools building scrips allow a choice in building OpenBLAS with USE_THREAD=0 or USE_THREAD=1 NUM_THREADS=64 (and af course anyone is free to use different settings).

I always opt for the USE_THREAD=0 because I have many kaldi-processes in deployment, and if OpenBLAS runs multithreaded it is countereffective in efficiency.

Now I understand that during start-up of a Kaldi process, some constants are computed from stored models, and this happens multi-threaded.

I've recompiled OpenBLAS v0.3.6 with USE_LOCKING=1, but this still gives the errors for my test.

@brada4
Copy link
Contributor

brada4 commented Jun 8, 2019

Another take at locking:
Build OpenBLAS with threads but run the tests with env OPENBLAS_NUM_THREADS=1.

If you do not set NUM_THREADS during build it takes number of cores in build system for threads and MAX(ncores*2,50) fro memory areas.

@martin-frbg
Copy link
Collaborator

v0.3.6 is too old to understand the new USE_LOCKING option, you would need to build a snapshot of the current develop branch - which I (mis)understood you were already using based on your comment above about using "latest" last week.

@davidavdav
Copy link
Author

Sorry, the problems occurred checking out latest, but inspecting the relog this was probably 3f427c0, 5 weeks ago (which is probably when I started using pykaldi with its own kaldi with its own OpenBLAS---this was probably never renewed because the Kaldi Makefile changes OpenBLAS Makefile.rule which then prevents subsequent git pulls to take effect). Then I checked out a commit 1.5 years ago (which I new didn't have the problem), and then I checked releases, and then did bisection.

So now with really latest 26411ac, and USE_LOCKING=1, the problems disappear. (Probably since 86dda5c, judging from the git log).

Thanks!

I suppose I can close this issue, and submit a PR for Kaldi's Makefile.

@TiborGY
Copy link
Contributor

TiborGY commented Jun 8, 2019

@davidavdav
Starting from 26411ac, you can use the thread safety tester included to test that your build is thread safe enough, given your particular build config. There are quite a few threading related build options, so if you are not sure it might be a good idea to test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants