Skip to content

Commit 804df7a

Browse files
venkatxvenkatsubraLinuxMinion
authored andcommitted
RDS/IB: VRPC DELAY / OSS RECONNECT CAUSES 5 MINUTE STALL ON PORT FAILURE
This problem occurs when the user gets notified of a successful rdma write + bcopy message completion but the peer application does not receive the bcopy message. This happens during a port down/up test. What seems to happen is the rdma write succeeds but the bcopy message fails. RDS should not be returning successful completion status to the user in this case. When RDS does a rdma followed by a bcopy message the user notification is supposed to be implemented by method #3 below. /* If the user asked for a completion notification on this * message, we can implement three different semantics: * 1. Notify when we received the ACK on the RDS message * that was queued with the RDMA. This provides reliable * notification of RDMA status at the expense of a one-way * packet delay. * 2. Notify when the IB stack gives us the completion event for * the RDMA operation. * 3. Notify when the IB stack gives us the completion event for * the accompanying RDS messages. * Here, we implement approach #3. To implement approach #2, * we would need to take an event for the rdma WR. To implement #1, * don't call rds_rdma_send_complete at all, and fall back to the notify * handling in the ACK processing code. But unfortunately the user gets notified earlier to knowing the bcopy send status. Right after rdma write completes the user gets notified even though the subsequent bcopy eventually fails. The fix is to delay signaling completions of rdma op till the bcopy send completes. Orabug: 22847528 Acked-by: Rama Nichanamatlu <[email protected]> Signed-off-by: Venkat Venkatsubra <[email protected]>
1 parent a08ba95 commit 804df7a

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

net/rds/ib_send.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -970,7 +970,7 @@ int rds_ib_xmit_rdma(struct rds_connection *conn, struct rm_rdma_op *op)
970970
send->s_queued = jiffies;
971971
send->s_op = NULL;
972972

973-
if (!op->op_remote_complete)
973+
if (!op->op_remote_complete && !op->op_notify)
974974
nr_sig += rds_ib_set_wr_signal_state(ic, send, op->op_notify);
975975

976976
send->s_wr.opcode = op->op_write ? IB_WR_RDMA_WRITE : IB_WR_RDMA_READ;

0 commit comments

Comments
 (0)