prov/efa: Changes for efa + shm peer provider integration #1

shijin-aws · 2023-03-16T00:37:28Z

No description provided.

Signed-off-by: aws-ceenugal <[email protected]>

It is unnecessary to call progress engine immediately after triggering a handshake. rxr_msg_generic_send does this when an error is returned. Signed-off-by: Wenduo Wang <[email protected]>

Use the newly introduced FI_OPT_CUDA_API_PERMITTED option to replace the environment variable FI_HMEM_CUDA_ENABLE_XFER. Signed-off-by: Wei Zhang <[email protected]>

Signed-off-by: Wei Zhang <[email protected]>

This patch made 2 changes to the option FI_OPT_CUDA_API_PERMITTED. Clarify that setting of this option may return -FI_EINVAL if either CUDA library or CUDA device is not available. Promise that all providers that support FI_HMEM capability implement this option. Signed-off-by: Wei Zhang <[email protected]>

Signed-off-by: OFIWG Bot <[email protected]>

…n-pages-main Update nroff-generated man pages

shm should call the start_msg function for the peer that was matched to and queued the unexpected message in the first place (saved in the rx_entry), not the owner srx. Signed-off-by: Alexia Ingerson <[email protected]>

Separate the msg and tag generic receive paths. This removes redundant checks in both paths and also fixes the peer srx start call which was incorrectly always calling the start_msg function. In the tagged case, we should be calling the start_tag function instead. Signed-off-by: Alexia Ingerson <[email protected]>

The previous code was incorrectly calling discard_msg for both queues. The queued tagged messages should be calling the discard_tag call instead of the discard_msg call. Signed-off-by: Alexia Ingerson <[email protected]>

shijin-aws · 2023-03-16T00:38:33Z

@sunkuamzn

When xnet_op_read_rsp() is executed, ep->cur_rx.entry is set to the head element of the list 'rma_read_queue'. The RX entry is effectively removed from the list only after the operation completes. This pattern becomes problematic if the EP is disabled before the RX completes (i.e., EP disabled because of a failing TX entry or an explicit shutdown). In this case, xnet_ep_flush_all_queues would complete the same RX entry twice: once when rma_read_queue is flushed (the RX entry is still listed here) and another time in the block for the condition "if (ep->cur_rx.entry)". The patch removes the RX entry from rma_read_queue before it gets assigned to ep->cur_rx.entry. It guarantees that the RX entry will be completed only once if the EP is disabled. Signed-off-by: Sylvain Didelot <[email protected]>

Signed-off-by: Kyle Gerheiser <[email protected]>

Include ofi_hmem.h to fix compilation issues on ROCR enabled systems. Particularly: implicit declaration of function ofi_hsa_amd_dereg_... Signed-off-by: Amir Shehata <[email protected]>

Update the SHM provider to use the ROCR HMEM asynchronous memory operations. . Unify the ipc and sar freestack, since they use the same structure. . When progressing an IPC operation, check if the device is ROCR and trigger an asynchronous operation. . When an asynchronous operation is queued, create an IPC entry and add it to the queue of pending operations. . During the top level progress loop check the queue of pending asynchronous operations and query the state of each one. Generate a complete event for finished operations. Since completions happen outside the context of libfabric we can't rely on the ep->region->signal flag to be set. Always check the pending queue. This shouldn't introduce much of an overhead if the queue is empty . use the ep->util_ep lock to protect the free stack and the ipc list of pending operations. Signed-off-by: Amir Shehata <[email protected]>

Prior to this patch, smr_generic_rma will write error completion for any error return of smr_fast_rma. This patch makes smr_generic_rma to return -FI_EAGAIN, if smr_fast_rma() return -FI_EAGAIN. This is because -FI_EAGAIN error means the operation has not been completed. Signed-off-by: Wei Zhang <[email protected]>

Prior to this patch, smr_generic_rma() call smr_write_error_comp() with a negative errno, causing the output cq entry to have a negative errno. this patch addressed the issue by using a positive errno. Signed-off-by: Wei Zhang <[email protected]>

For peer cq, fi_cq_read is not expected to access cirq and reading cqes. It show only progress the cq. Signed-off-by: Shi Jin <[email protected]>

Implement all the functions in efa_rdm_srx_owner_ops. Also refactor the code in rxr_pkt_proc_msgrtm() and rxr_pkt_proc_tagrtm() so they can be reused by efa_rdm_srx_owner_ops. Signed-off-by: Shi Jin <[email protected]>

Signed-off-by: Shi Jin <[email protected]>

After using shm provider as a peer, efa provider can post fi_sendmsg/fi_tsendmsg directly to shm ep, without involving RDM protocols. Also there is no need to pre-post fi_recv for shm provider as it will share the rx posted to efa ep. Signed-off-by: Shi Jin <[email protected]>

To share the efa cq to shm provider, we need to move shm's fi_cq_open call into efa_rdm_cq_open because application could create multiple efa cqs, and each efa cq must be shared to its corresponding shm cq. At the mean time, there is no need to poll shm cq explicitly in rxr_ep_progress, instead we just need to call fi_cq_read(shm_cq, NULL, 0) inside efa's fi_cq_read() to progress the shm cq manually. Signed-off-by: Shi Jin <[email protected]>

In smr_srx_context, shm should not modify the peer_ops of the imported srx. It should allocate a new srx and set it there. Signed-off-by: Shi Jin <[email protected]>

If a posted receive matches with a saved receive, we may need to increment the rx counter. Set the rx counter increment callback to match that of the posted receive. This fixes an assert in xnet_cntr_inc() accessing a NULL cntr_inc function pointer. Program received signal SIGABRT, Aborted. 0x0000155552d4d37f in raise () from /lib64/libc.so.6 #0 0x0000155552d4d37f in raise () from /lib64/libc.so.6 #1 0x0000155552d37db5 in abort () from /lib64/libc.so.6 #2 0x0000155552d37c89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6 #3 0x0000155552d45a76 in __assert_fail () from /lib64/libc.so.6 #4 0x00001555522967f9 in xnet_cntr_inc (ep=0x6e4c70, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:347 #5 0x0000155552296836 in xnet_report_cntr_success (ep=0x6e4c70, cq=0x6ca930, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:354 #6 0x000015555229970d in xnet_complete_saved (saved_entry=0x6f7a30) at prov/tcp/src/xnet_progress.c:153 #7 0x0000155552299961 in xnet_recv_saved (saved_entry=0x6f7a30, rx_entry=0x6f7840) at prov/tcp/src/xnet_progress.c:188 #8 0x00001555522946f8 in xnet_srx_tag (srx=0x6dd1c0, recv_entry=0x6f7840) at prov/tcp/src/xnet_srx.c:445 #9 0x0000155552294bb1 in xnet_srx_trecv (ep_fid=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_srx.c:558 #10 0x000015555228f60e in fi_trecv (ep=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at ./include/rdma/fi_tagged.h:91 #11 0x00001555522900a7 in xnet_rdm_trecv (ep_fid=0x6d9fe0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_rdm.c:212 Signed-off-by: Sean Hefty <[email protected]>

aws-ceenugal and others added 13 commits March 14, 2023 14:43

hmem_neuron: Add support for neuron dma-buf

d7eb564

Signed-off-by: aws-ceenugal <[email protected]>

prov/efa: Add support for neuron dma-buf

9292861

Signed-off-by: aws-ceenugal <[email protected]>

prov/efa: remove redundant progress after triggering handshake

7d3f4d7

It is unnecessary to call progress engine immediately after triggering a handshake. rxr_msg_generic_send does this when an error is returned. Signed-off-by: Wenduo Wang <[email protected]>

prov/efa: use FI_OPT_CUDA_API_PERMITTED to replace cuda_xfer_setting()

b02ad48

Use the newly introduced FI_OPT_CUDA_API_PERMITTED option to replace the environment variable FI_HMEM_CUDA_ENABLE_XFER. Signed-off-by: Wei Zhang <[email protected]>

prov/shm: implement the FI_OPT_CUDA_API_PERMITTED option

140c410

Signed-off-by: Wei Zhang <[email protected]>

prov/verbs: implement the FI_OPT_CUDA_API_PERMITTED option

e4e801e

Signed-off-by: Wei Zhang <[email protected]>

prov/rxm: implement the FI_OPT_CUDA_API_PERMITTED option

a71384b

Signed-off-by: Wei Zhang <[email protected]>

Updated nroff-generated man pages

5edb30d

Signed-off-by: OFIWG Bot <[email protected]>

Merge pull request ofiwg#8659 from ofiwg/pr/update-nroff-generated-ma…

155f8fc

…n-pages-main Update nroff-generated man pages

prov/shm: fix start_msg call

b49fb7f

shm should call the start_msg function for the peer that was matched to and queued the unexpected message in the first place (saved in the rx_entry), not the owner srx. Signed-off-by: Alexia Ingerson <[email protected]>

shijin-aws force-pushed the peer_devel_sjina branch 2 times, most recently from 172c488 to 585ee4a Compare March 16, 2023 15:37

sydidelot and others added 7 commits March 16, 2023 10:50

man/verbs: Fix link in verbs doc

4cd6309

Signed-off-by: Kyle Gerheiser <[email protected]>

hmem/rocr: Fix compilation issue

4634ab6

Include ofi_hmem.h to fix compilation issues on ROCR enabled systems. Particularly: implicit declaration of function ofi_hsa_amd_dereg_... Signed-off-by: Amir Shehata <[email protected]>

core/util_cq: fix the behavior of cq_read for FI_PEER

9add342

For peer cq, fi_cq_read is not expected to access cirq and reading cqes. It show only progress the cq. Signed-off-by: Shi Jin <[email protected]>

shijin-aws force-pushed the peer_devel_sjina branch from 2d20ad1 to eb31b62 Compare March 17, 2023 19:19

shijin-aws added 6 commits March 17, 2023 19:53

prov/efa: Implement owner_ops in peer API.

580c674

Implement all the functions in efa_rdm_srx_owner_ops. Also refactor the code in rxr_pkt_proc_msgrtm() and rxr_pkt_proc_tagrtm() so they can be reused by efa_rdm_srx_owner_ops. Signed-off-by: Shi Jin <[email protected]>

prov/efa: Update peer_rx_entry fields from rx entry.

e88954e

Signed-off-by: Shi Jin <[email protected]>

prov/efa: Share peer_srx between efa and shm provider.

d4709a4

Signed-off-by: Shi Jin <[email protected]>

prov/shm: fix the srx import procedure.

754d709

In smr_srx_context, shm should not modify the peer_ops of the imported srx. It should allocate a new srx and set it there. Signed-off-by: Shi Jin <[email protected]>

shijin-aws added 2 commits March 17, 2023 19:53

Fix bugs in efa_rdm_srx.c

038ae4a

fix rxr_msg.c

3d77c02

shijin-aws force-pushed the peer_devel_sjina branch 2 times, most recently from 410cc9b to 3d77c02 Compare March 17, 2023 20:01

Merge branch 'peer_devel' into peer_devel_sjina

f3c9505

shijin-aws merged commit a719145 into peer_devel Mar 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

prov/efa: Changes for efa + shm peer provider integration #1

prov/efa: Changes for efa + shm peer provider integration #1

Uh oh!

shijin-aws commented Mar 16, 2023

Uh oh!

shijin-aws commented Mar 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

prov/efa: Changes for efa + shm peer provider integration #1

prov/efa: Changes for efa + shm peer provider integration #1

Uh oh!

Conversation

shijin-aws commented Mar 16, 2023

Uh oh!

shijin-aws commented Mar 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants