You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
osc/rdma: do not use local leader optimization for active message RDMA
The local leader optimization means that:
on each node, a process was designated as the local leader,
who setup shared memory, and other processes on the same
node would map their states to local leader's shared memory.
When a process try to update a peer process's state, the
process will do that through atomic actions on local leader's
memory. The peer's state is then updated through shard memory.
Essentially, using local leader optimization means two different
channels are used to transfer data and to update peer's update.
This optimization is incorrect for BTL that uses active message RDMA
. Active message RDMA uses send/receive to emulate put and atomics,
and its put implementation does not delivery complete, e.g. when the
initiator got completion for a put action, it only means data has been sent.
it does not mean the data has been delivered to the target buffer.
Therefore, if peer's state is updated through a different communication
channel, it can happen that peer's state is updated before the put
action is completed on the peer, which will cause data corruption.
This patch made the change that: for active message RDMA, peer's state
is updated using the same channel data is transferred (e.g diabling
the local leader optimization).
To achieve that, each process need to have the pointer to each peer's
state, for which this patch introduced a function gather_peer_state().
Note because active message RDMA does not use memory registration,
the state_handle is not gathered.
This patch then sets peer's state pointer using gathered information,
and use the same endpoint to update data and transfer data.
Signed-off-by: Wei Zhang <[email protected]>
0 commit comments