-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prov/verbs: Add support for IBV_ACCESS_RELAXED_ORDERING #9378
base: main
Are you sure you want to change the base?
Conversation
7e4e296
to
a40b2f7
Compare
a40b2f7
to
c3bc905
Compare
prov/util/src/util_domain.c
Outdated
@@ -112,6 +112,8 @@ util_domain_init(struct util_domain *domain, const struct fi_info *info, | |||
domain->info_domain_caps = info->caps | info->domain_attr->caps; | |||
domain->info_domain_mode = info->mode | info->domain_attr->mode; | |||
domain->mr_mode = info->domain_attr->mr_mode; | |||
domain->tx_msg_order = info->tx_attr->msg_order; | |||
domain->rx_msg_order = info->rx_attr->msg_order; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if we could OR in the msg_order values every time we allocate an endpoint. That at least updates the tx/rx_msg_order fields. However, it's possible for the app to register memory prior to creating any endpoints. In that case, we would incorrectly use relaxed ordering.
So, I'm wondering if we have to have a domain that only supports relaxed ordering. If the app calls fi_info with no ordering set, we can return that domain as one of the options. Endpoints allocated on that domain are only unordered.
This brings up that I think most apps need ordering for sends. AFAICT, the relax ordering here mostly impacts RMA ordering: WAW, RAR, etc. We may only need to check that those bits are not set.
c3bc905
to
338bba5
Compare
@shefty I have updated the PR based on your suggestion to create one domain for relaxed ordering. |
IBV_ACCESS_RELAXED_ORDERING allows the system to reorder Send/Write/Atomic operations to improve performance. The patch enables IBV_ACCESS_RELAXED_ORDERING if the application has requested no ordering in TX/RX attributes. Signed-off-by: Sylvain Didelot <[email protected]>
338bba5
to
0609cd6
Compare
CI failure is related. Changes break verbs;ofi_rxd. Most or all fabtests fail, but here's a simple one:
Quickly scanning the changes didn't point anything to me. AFAICT, dgram endpoints shouldn't have been impacted by this change. |
Question: Is the correct behavior to zero the message order flags? Or, is the correct behavior to set max_order_waw_size to zero? The thinking here is that for FI_MSG EPs, the operations would still be ordered in the network. It is target PCIe write operations which will be unordered with respect to each other. |
IMO, the feature isn't well enough documented at the verbs layer to figure out what exactly it does. The PR comment indicates that sends could be received out of order. Is that true, or would sends still match receive buffers in order? Hmm... the latter seems reasonable, since matching would be done by the NIC. That implies that WAW ordering, and RAR I guess, is not guaranteed. |
Following a recent issue I have with PCIe ordering (#9621), I am worried that the possible reordering of the PCIe writes at the target will break the completion ordering of |
I haven't found any information on what IBV_ACCESS_RELAXED_ORDERING actually does. If completions can be written before the message data is received, then it's use seems suspect. I will try to find out more details. |
Based on initial feedback, IBV_ACCESS_RELAXED_ORDERING should not impact completions. After reading a completion, all data should be present. I'm still investigating what it does beyond that and if it impacts message ordering. |
See linux-rdma/rdma-core#1413 for a proposed update to better document the change in verbs API behavior. My understanding is that @iziemba 's earlier proposal to disable waw ordering is correct. |
I honestly still don't know what that flag does. It sets the relaxed ordering bit for PCI transfers, but it's unclear when that bit is set. The flag applies at the receiving side. If WAW ordering is impacted, I would also check/update the msg_order flags. |
I just discovered that the new CXI provider has support for Relaxed Ordering: https://github.com/ofiwg/libfabric/blob/main/man/fi_cxi.7.md#pcie-ordering |
IBV_ACCESS_RELAXED_ORDERING allows the system to reorder
Send/Write/Atomic operations to improve performance.
The patch enables IBV_ACCESS_RELAXED_ORDERING if the application
has requested no ordering in TX/RX attributes.