Skip to content

osc/rdma, btl/tcp: fix various issues with osc/rdma #9400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
18 changes: 14 additions & 4 deletions ompi/mca/osc/rdma/osc_rdma_peer.c
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,19 @@ static int ompi_osc_rdma_peer_btl_endpoint (struct ompi_osc_rdma_module_t *modul
}
}

/* unlikely but can happen when creating a peer for self */
if (peer_id == ompi_comm_rank (module->comm)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really concerned about this approach. I suppose it works, but it feels like the right place for this decision is during the initial btl selection logic.

for (int btl_index = 0 ; btl_index < num_btls ; ++btl_index) {
struct mca_btl_base_module_t *btl;

btl = bml_endpoint->btl_rdma.bml_btls[btl_index].btl;
if (strcmp(btl->btl_component->btl_version.mca_component_name, "self")==0) {
*btl_out = btl;
*endpoint = bml_endpoint->btl_eager.bml_btls[btl_index].btl_endpoint;
return OMPI_SUCCESS;
}
}
}

return OMPI_ERR_UNREACH;
}

Expand All @@ -86,9 +98,7 @@ int ompi_osc_rdma_new_peer (struct ompi_osc_rdma_module_t *module, int peer_id,

/* find a btl/endpoint to use for this peer */
int ret = ompi_osc_rdma_peer_btl_endpoint (module, peer_id, &btl, &endpoint);
if (OPAL_UNLIKELY(OMPI_SUCCESS != ret &&
!(module->selected_btls[0]->btl_atomic_flags & MCA_BTL_ATOMIC_SUPPORTS_GLOB) &&
(peer_id != ompi_comm_rank (module->comm)))) {
if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) {
return ret;
}

Expand Down