Skip to content

Commit 343d7de

Browse files
Daniel HayonShacharKagan
authored andcommitted
tests: Add timeout to DC stream QP recovery test
Replace retry count with time-based polling when waiting for QP to transition to ERR state. The previous approach with only 3 retries was insufficient on some hardware configurations where the state transition takes longer. Signed-off-by: Daniel Hayon <dhayon@nvidia.com> Signed-off-by: Shachar Kagan <skagan@nvidia.com>
1 parent 6bf745e commit 343d7de

1 file changed

Lines changed: 4 additions & 2 deletions

File tree

tests/test_mlx5_dc.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33

44
import unittest
55
import errno
6+
import time
67

78
from tests.mlx5_base import Mlx5DcResources, Mlx5RDMATestCase, Mlx5DcStreamsRes
89
from pyverbs.pyverbs_error import PyverbsRDMAError
@@ -121,9 +122,10 @@ def test_dc_stream_qp_recovery(self):
121122
with self.assertRaisesRegex(PyverbsRDMAError, r'Remote access error'):
122123
u.rdma_traffic(**self.traffic_args, new_send=True,
123124
send_op=ibv_wr_opcode.IBV_WR_RDMA_WRITE)
124-
# Retry mechanism: QP state update to ERR takes time after errors occur
125+
# Poll QP state with timeout: it takes time to transition QP to ERR after errors
125126
qp_in_err_state = False
126-
for _ in range(3):
127+
start = time.perf_counter()
128+
while time.perf_counter() - start < 1.0:
127129
qp_attr, _ = self.client.qps[qp_idx].query(ibv_qp_attr_mask.IBV_QP_STATE)
128130
if qp_attr.cur_qp_state == ibv_qp_state.IBV_QPS_ERR:
129131
qp_in_err_state = True

0 commit comments

Comments
 (0)