Description
When running the cxx dynamics test, it ''sometimes'' fails with the following message (v1.4 branch -- did not test the trunk extensively to see if this was happening there):
[[56157,8],1][btl_openib_component.c:2951:handle_wc] from svbu-mpi042 to: svbu-mpi042 error polling HP CQ with status WORK REQUEST FLUSHED ERROR status number 5 for wr_id 12750656 opcode 0 vendor error 249 qp_idx 0
There are no error messages before this. It always fails in the MPI::ARGV_NULL test, but I don't know if that means anything. The test fails this way in about 1 out of every 10 or 20 runs.
It's an odd error, because a FLUSHED event should only occur if some other error previously occurred that caused the flush.
FWIW, in my testing, I ''once'' got a segv in the "connect" spawned child process in rdmacm_component_finalize:
for (item = opal_list_remove_first(&server_listener_list);
NULL != item;
item = opal_list_remove_first(&server_listener_list)) {
rdmacm_contents_t *contents = (rdmacm_contents_t*) item;
item2 = opal_list_remove_first(&(contents->ids));
OBJ_RELEASE(item2);
gdb on the core dump showed that item2 was NULL. I'm not quite sure how that could happen! This only happened once in dozens of runs that I tried... but it ''did'' happen.
It's quite possible that the rdmacm CPC is required to make this error occur.