-
Notifications
You must be signed in to change notification settings - Fork 903
Openib dynamic add proc race conditions #1248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The problem was in mca_btl_openib_proc_create. This function may be called from several places simultaneously: * from the main thread when somebody wants to do `MPI_Send()` (for example) for the first time; * from udcm if the counterpart peer is trying to connect and `mca_btl_openib_get_ep()` is called. In this case one of the threads may add an uninitialized proc structure to the `mca_btl_openib_component.ib_procs` and the other will read it and treat as initialized. This commit turns ib_proc initialization into a single atomic operation.
it was already existing.
:bot:retest |
aside the problems that are fixed with this changeset I'm seeing race conditions here: https://github.com/open-mpi/ompi/blob/master/opal/mca/btl/openib/btl_openib.c#L1021. Although |
Update: MTT shows noticeable improvement: 2 errors after fix versus 27 before. |
Why not also make the locks in add_procs not dependent on opal_using_threads()? The function is not in the critical path. |
@hppritcha Will probably want this in 2.0.0 final. Should get this merged into master today or tomorrow. |
@hjelmn Not sure I understand what you're saying. Can you explain? 2015-12-21 22:25 GMT+06:00 Nathan Hjelm [email protected]:
С Уважением, Поляков Артем Юрьевич |
b39b115
to
3c2f6d5
Compare
Looks good to me. |
@hjelmn ok, here is one more fix that I'd like to submit in this PR. |
e9f8a1f
to
aee2c31
Compare
aee2c31
to
08ad835
Compare
@hppritcha when does 2.0.0 scheduled? |
@hjelmn All my checks are passed. Please, feel free to merge it. |
Well done, Artem! On Tue, Dec 22, 2015 at 12:55 PM, Artem Polyakov [email protected]
|
@artpol84 please open a PR on v2.x for this. |
Are we ok to merge this? среда, 23 декабря 2015 г. пользователь Howard Pritchard написал:
Best regards, Artem Polyakov |
Yup. Ok to merge. |
Openib dynamic add proc race conditions
Cool, I'll PR to 2.x. |
btl/openib: fix segmentation fault
@hjelmn This is relatively big PR so I'm postin it now to make you aware about it and to let you start the review during your daytime.
As for now I can say that this fixes problems I was observing with some of the OSU collectives tests. I've launched MTT to do more general evaluation. I will post the results when I'll get them.
Also this PR is not a final solution. If you'll be OK with it I will provide more fixes on top of it.