Skip to content

Conversation

@sunkuamzn
Copy link
Collaborator

No description provided.


if (peer->is_local && rxr_ep->use_shm_for_tx) {
use_lower_ep_read = true;
// msg_clone->addr = peer->shm_fiaddr;
Copy link
Collaborator Author

@sunkuamzn sunkuamzn Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fi_rma_bw test passes when I use msg and msg_clone. I don't know how that's possible..

I have to use msg_clone to set the address because the input parameter msg is const. Using msg_clone generates a compiler warning.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fabtests only has 1 peer that that the shm_fiaddr could be the same with efa addr. We should use shm_fiaddr for any fi_* call for shm

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see OK

What's the best way to do this? The msg_clone idea which I have here generates a compiler warning. I could do a memcpy but that would be more expensive?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are rxr_msg_construct and rxr_tmsg_construct in rxr_msg.h. You can do similar things to have a function to construct a new fi_rma_msg

@shijin-aws shijin-aws merged commit f8ea3f9 into shijin-aws:peer_devel Mar 20, 2023
shijin-aws pushed a commit that referenced this pull request Sep 12, 2023
If a posted receive matches with a saved receive, we may need to
increment the rx counter.  Set the rx counter increment callback
to match that of the posted receive.  This fixes an assert in
xnet_cntr_inc() accessing a NULL cntr_inc function pointer.

Program received signal SIGABRT, Aborted.
0x0000155552d4d37f in raise () from /lib64/libc.so.6
#0  0x0000155552d4d37f in raise () from /lib64/libc.so.6
#1  0x0000155552d37db5 in abort () from /lib64/libc.so.6
#2  0x0000155552d37c89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3  0x0000155552d45a76 in __assert_fail () from /lib64/libc.so.6
#4  0x00001555522967f9 in xnet_cntr_inc (ep=0x6e4c70, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:347
#5  0x0000155552296836 in xnet_report_cntr_success (ep=0x6e4c70, cq=0x6ca930, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:354
#6  0x000015555229970d in xnet_complete_saved (saved_entry=0x6f7a30) at prov/tcp/src/xnet_progress.c:153
#7  0x0000155552299961 in xnet_recv_saved (saved_entry=0x6f7a30, rx_entry=0x6f7840) at prov/tcp/src/xnet_progress.c:188
#8  0x00001555522946f8 in xnet_srx_tag (srx=0x6dd1c0, recv_entry=0x6f7840) at prov/tcp/src/xnet_srx.c:445
#9  0x0000155552294bb1 in xnet_srx_trecv (ep_fid=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_srx.c:558
#10 0x000015555228f60e in fi_trecv (ep=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at ./include/rdma/fi_tagged.h:91
#11 0x00001555522900a7 in xnet_rdm_trecv (ep_fid=0x6d9fe0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_rdm.c:212

Signed-off-by: Sean Hefty <[email protected]>
shijin-aws added a commit that referenced this pull request Nov 24, 2025
We need such barrier in two transitions:
1. from wqe write to doorbell ring
2. from doorbell ring to wqe write.

For #2, we currently have the barrier right after the doorbell ring,
which is different from rdma-core implementation that has the doorbell
right in the next wr_start, and according to benchmark such choice
hurts performance on ARM platform. This patch moves the #2 barrier
to the beginning of wr start, before the wqe write, to be consistent
with rdma-core and improves the performance.

Signed-off-by: Shi Jin <[email protected]>
shijin-aws added a commit that referenced this pull request Nov 25, 2025
We need such barrier in two transitions:
1. from wqe write to doorbell ring
2. from doorbell ring to wqe write.

For #2, we currently have the barrier right after the doorbell ring,
which is different from rdma-core implementation that has the doorbell
right in the next wr_start, and according to benchmark such choice
hurts performance on ARM platform. This patch moves the #2 barrier
to the beginning of wr start, before the wqe write, to be consistent
with rdma-core and improves the performance.

Signed-off-by: Shi Jin <[email protected]>
shijin-aws added a commit that referenced this pull request Dec 1, 2025
We need such barrier in two transitions:
1. from wqe write to doorbell ring
2. from doorbell ring to wqe write.

For #2, we currently have the barrier right after the doorbell ring,
which is different from rdma-core implementation that has the doorbell
right in the next wr_start, and according to benchmark such choice
hurts performance on ARM platform. This patch moves the #2 barrier
to the beginning of wr start, before the wqe write, to be consistent
with rdma-core and improves the performance.

Signed-off-by: Shi Jin <[email protected]>
(cherry picked from commit c4937ed)
shijin-aws added a commit that referenced this pull request Dec 29, 2025
We need such barrier in two transitions:
1. from wqe write to doorbell ring
2. from doorbell ring to wqe write.

For #2, we currently have the barrier right after the doorbell ring,
which is different from rdma-core implementation that has the doorbell
right in the next wr_start, and according to benchmark such choice
hurts performance on ARM platform. This patch moves the #2 barrier
to the beginning of wr start, before the wqe write, to be consistent
with rdma-core and improves the performance.

Signed-off-by: Shi Jin <[email protected]>
(cherry picked from commit c4937ed)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants