Skip to content

Yet one more fix to intercommunicator splitting logic. #1188

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 8, 2015

Conversation

artpol84
Copy link
Contributor

@artpol84 artpol84 commented Dec 8, 2015

@hjelmn :))) I finally got your original idea, Nathan. I've checked this against my icsend and icsplit tests manually and will fire up MTT. Will post the results later, PR'ing now in case you'll have the time to check this today.

Previous commit f279474 reverts Nathans changes. However it turns out
that I was unable to trace his logic until I started investigation of
icsplit hang. Bug was triggered when splitting Intercom was giving a group
where on side of the communicator was empty (icsplit, intercom create #2).
in this case remote_size == 0 and there is no way to distinguish between
inter- and intra-communicator.
Conclusion: We do need to distinguish between intra- and inter-communicators.
So we should use ompi_mpi_group_null.group.

Previous commit f279474 reverts Nathans changes. However it turns out
that I was unable to trace his logic until I started investigation of
icsplit hang. Bug was triggered when splitting Intercom was giving a group
where on side of the communicator was empty (icsplit, intercom create #2).
in this case remote_size == 0 and there is no way to distinguish between
inter- and intra-communicator.
Conclusion: We do need to distinguish between intra- and inter-communicators.
So we should use ompi_mpi_group_null.group.
@artpol84
Copy link
Contributor Author

artpol84 commented Dec 8, 2015

P.S. This fix also fixes icm, probe-intercomm, comm_group_rand tests.
And for iccreate test I see 32 errors instead of 222.
Update: correction - after MTT runs I can say that icm and probe-intercomm are still failing. I'll check that later.

@artpol84
Copy link
Contributor Author

artpol84 commented Dec 8, 2015

P.P.S and I was also observing lots of errors in MPICH/coll tests related to intercomm. They seems to be fixed too. At least icreduce works now. I will check further and post here.

@artpol84
Copy link
Contributor Author

artpol84 commented Dec 8, 2015

According to MTT icsplit and icsend are working fine now. All mpich/pt2pt tests are passed. I see 2 MPICH/comm tests failing: icm and probe-intercomm and I see 13 of MPICH/coll tests that failing (previously 25 of them were failing).
So this fix in combination with f68c315 are moving master forward to stable state :).

@hjelmn
Copy link
Member

hjelmn commented Dec 8, 2015

Ah, yes. I see what was happening. Thanks for tracking this down. Don't forget to PR the patches to 2.x.

hjelmn added a commit that referenced this pull request Dec 8, 2015
Yet one more fix to intercommunicator splitting logic.
@hjelmn hjelmn merged commit b47a64f into open-mpi:master Dec 8, 2015
@artpol84
Copy link
Contributor Author

artpol84 commented Dec 8, 2015

Sure!
By the way do we want to retain patch sequence or I can merge them in one
patch?

вторник, 8 декабря 2015 г. пользователь Nathan Hjelm написал:

Ah, yes. I see what was happening. Thanks for tracking this down. Don't
forget to PR the patches to 2.x.


Reply to this email directly or view it on GitHub
#1188 (comment).


Best regards, Artem Polyakov
(Mobile mail)

@hjelmn
Copy link
Member

hjelmn commented Dec 8, 2015

Best to retain the patch sequence so we keep a 1-1 mapping of master to release commits. Update the commit messages to include the master commit hash. Take a look at at open PR on ompi-release to see how we usually handle that.

@artpol84 artpol84 deleted the intercomm_split_fix branch December 10, 2015 06:17
jsquyres pushed a commit to jsquyres/ompi that referenced this pull request Sep 19, 2016
…ziness

configury: fix opal_config_pthreads when -lrt is set
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants