[doc][rdt] Add the limitations of rdt#58063
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds a documentation note about a limitation of Ray Direct Transport (RDT) with NIXL. The change is clear and helps inform users about a known issue. I have one minor suggestion to improve the wording for better clarity and professionalism.
|
|
||
| For NIXL: | ||
|
|
||
| * Due to an issue with our implementation of memory deregistration, we currently do not support repeated transfers of tensors that share the same memory space but belong to different objects. We will fix this problem soon. |
There was a problem hiding this comment.
The phrase "We will fix this problem soon" is informal and vague. It's better to remove it for conciseness and professionalism, especially since the introduction to this section already states that limitations may be addressed in future releases.
| * Due to an issue with our implementation of memory deregistration, we currently do not support repeated transfers of tensors that share the same memory space but belong to different objects. We will fix this problem soon. | |
| * Due to an issue with our implementation of memory deregistration, we currently do not support repeated transfers of tensors that share the same memory space but belong to different objects. |
There was a problem hiding this comment.
Can you give like 2 small examples -
-
sending 2 lists of tensors that overlap
[a, b, c], [c, d, e] -
sending the same tensor twice
There was a problem hiding this comment.
We might as well do the error detection in the same PR so we can provide a code sample showing what error to expect.
Also, I think this assumes too much system knowledge from the user.
Due to an issue with our implementation of memory deregistration,
-> "Due to a known issue"
repeated transfers of tensors that share the same memory space but belong to different objects.
Technically this is possible, but you have to make sure to free the first object before the second. It's probably clearer with code examples, like dhyey said.
Signed-off-by: Dhyey Shah <dhyey2019@gmail.com>
Co-authored-by: Stephanie Wang <smwang@cs.washington.edu> Signed-off-by: Qiaolin Yu <liin1211@outlook.com>
Signed-off-by: Dhyey Shah <dhyey2019@gmail.com>
Signed-off-by: Dhyey Shah <dhyey2019@gmail.com> Signed-off-by: Qiaolin-Yu <liin1211@outlook.com> Signed-off-by: Qiaolin Yu <liin1211@outlook.com> Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Co-authored-by: Stephanie Wang <smwang@cs.washington.edu>
## Description Cherry-picking #58063 to throw an exception when trying to double send the same ref before gc because it can trigger a NIXL error. Also adding documentation for this. Signed-off-by: Dhyey Shah <dhyey2019@gmail.com> Signed-off-by: Qiaolin-Yu <liin1211@outlook.com> Signed-off-by: Qiaolin Yu <liin1211@outlook.com> Co-authored-by: Qiaolin Yu <liin1211@outlook.com> Co-authored-by: Stephanie Wang <smwang@cs.washington.edu>
Signed-off-by: Dhyey Shah <dhyey2019@gmail.com> Signed-off-by: Qiaolin-Yu <liin1211@outlook.com> Signed-off-by: Qiaolin Yu <liin1211@outlook.com> Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Co-authored-by: Stephanie Wang <smwang@cs.washington.edu> Signed-off-by: xgui <xgui@anyscale.com>
Signed-off-by: Dhyey Shah <dhyey2019@gmail.com> Signed-off-by: Qiaolin-Yu <liin1211@outlook.com> Signed-off-by: Qiaolin Yu <liin1211@outlook.com> Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Co-authored-by: Stephanie Wang <smwang@cs.washington.edu>
Signed-off-by: Dhyey Shah <dhyey2019@gmail.com> Signed-off-by: Qiaolin-Yu <liin1211@outlook.com> Signed-off-by: Qiaolin Yu <liin1211@outlook.com> Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Co-authored-by: Stephanie Wang <smwang@cs.washington.edu> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Signed-off-by: Dhyey Shah <dhyey2019@gmail.com> Signed-off-by: Qiaolin-Yu <liin1211@outlook.com> Signed-off-by: Qiaolin Yu <liin1211@outlook.com> Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Co-authored-by: Stephanie Wang <smwang@cs.washington.edu> Signed-off-by: Future-Outlier <eric901201@gmail.com>
…ct#58159) ## Description Cherry-picking ray-project#58063 to throw an exception when trying to double send the same ref before gc because it can trigger a NIXL error. Also adding documentation for this. Signed-off-by: Dhyey Shah <dhyey2019@gmail.com> Signed-off-by: Qiaolin-Yu <liin1211@outlook.com> Signed-off-by: Qiaolin Yu <liin1211@outlook.com> Co-authored-by: Qiaolin Yu <liin1211@outlook.com> Co-authored-by: Stephanie Wang <smwang@cs.washington.edu>
Signed-off-by: Dhyey Shah <dhyey2019@gmail.com> Signed-off-by: Qiaolin-Yu <liin1211@outlook.com> Signed-off-by: Qiaolin Yu <liin1211@outlook.com> Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Co-authored-by: Stephanie Wang <smwang@cs.washington.edu> Signed-off-by: peterxcli <peterxcli@gmail.com>
Description
rdt currently has some limitations. update it in the doc to clarify. Disable some tests for the new assertion.
Related issues
Additional information