-
Notifications
You must be signed in to change notification settings - Fork 0
Peer API RMA changes + SHM clean up #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
prov/efa/src/rdm/rxr_rma.c
Outdated
|
|
||
| if (peer->is_local && rxr_ep->use_shm_for_tx) { | ||
| use_lower_ep_read = true; | ||
| // msg_clone->addr = peer->shm_fiaddr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fi_rma_bw test passes when I use msg and msg_clone. I don't know how that's possible..
I have to use msg_clone to set the address because the input parameter msg is const. Using msg_clone generates a compiler warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fabtests only has 1 peer that that the shm_fiaddr could be the same with efa addr. We should use shm_fiaddr for any fi_* call for shm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see OK
What's the best way to do this? The msg_clone idea which I have here generates a compiler warning. I could do a memcpy but that would be more expensive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are rxr_msg_construct and rxr_tmsg_construct in rxr_msg.h. You can do similar things to have a function to construct a new fi_rma_msg
Signed-off-by: Sai Sunku <[email protected]>
Signed-off-by: Sai Sunku <[email protected]>
Signed-off-by: Sai Sunku <[email protected]>
If a posted receive matches with a saved receive, we may need to increment the rx counter. Set the rx counter increment callback to match that of the posted receive. This fixes an assert in xnet_cntr_inc() accessing a NULL cntr_inc function pointer. Program received signal SIGABRT, Aborted. 0x0000155552d4d37f in raise () from /lib64/libc.so.6 #0 0x0000155552d4d37f in raise () from /lib64/libc.so.6 #1 0x0000155552d37db5 in abort () from /lib64/libc.so.6 #2 0x0000155552d37c89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6 #3 0x0000155552d45a76 in __assert_fail () from /lib64/libc.so.6 #4 0x00001555522967f9 in xnet_cntr_inc (ep=0x6e4c70, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:347 #5 0x0000155552296836 in xnet_report_cntr_success (ep=0x6e4c70, cq=0x6ca930, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:354 #6 0x000015555229970d in xnet_complete_saved (saved_entry=0x6f7a30) at prov/tcp/src/xnet_progress.c:153 #7 0x0000155552299961 in xnet_recv_saved (saved_entry=0x6f7a30, rx_entry=0x6f7840) at prov/tcp/src/xnet_progress.c:188 #8 0x00001555522946f8 in xnet_srx_tag (srx=0x6dd1c0, recv_entry=0x6f7840) at prov/tcp/src/xnet_srx.c:445 #9 0x0000155552294bb1 in xnet_srx_trecv (ep_fid=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_srx.c:558 #10 0x000015555228f60e in fi_trecv (ep=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at ./include/rdma/fi_tagged.h:91 #11 0x00001555522900a7 in xnet_rdm_trecv (ep_fid=0x6d9fe0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_rdm.c:212 Signed-off-by: Sean Hefty <[email protected]>
We need such barrier in two transitions: 1. from wqe write to doorbell ring 2. from doorbell ring to wqe write. For #2, we currently have the barrier right after the doorbell ring, which is different from rdma-core implementation that has the doorbell right in the next wr_start, and according to benchmark such choice hurts performance on ARM platform. This patch moves the #2 barrier to the beginning of wr start, before the wqe write, to be consistent with rdma-core and improves the performance. Signed-off-by: Shi Jin <[email protected]>
We need such barrier in two transitions: 1. from wqe write to doorbell ring 2. from doorbell ring to wqe write. For #2, we currently have the barrier right after the doorbell ring, which is different from rdma-core implementation that has the doorbell right in the next wr_start, and according to benchmark such choice hurts performance on ARM platform. This patch moves the #2 barrier to the beginning of wr start, before the wqe write, to be consistent with rdma-core and improves the performance. Signed-off-by: Shi Jin <[email protected]>
We need such barrier in two transitions: 1. from wqe write to doorbell ring 2. from doorbell ring to wqe write. For #2, we currently have the barrier right after the doorbell ring, which is different from rdma-core implementation that has the doorbell right in the next wr_start, and according to benchmark such choice hurts performance on ARM platform. This patch moves the #2 barrier to the beginning of wr start, before the wqe write, to be consistent with rdma-core and improves the performance. Signed-off-by: Shi Jin <[email protected]> (cherry picked from commit c4937ed)
We need such barrier in two transitions: 1. from wqe write to doorbell ring 2. from doorbell ring to wqe write. For #2, we currently have the barrier right after the doorbell ring, which is different from rdma-core implementation that has the doorbell right in the next wr_start, and according to benchmark such choice hurts performance on ARM platform. This patch moves the #2 barrier to the beginning of wr start, before the wqe write, to be consistent with rdma-core and improves the performance. Signed-off-by: Shi Jin <[email protected]> (cherry picked from commit c4937ed)
No description provided.