[Data] Partial fix for Dataset.context not being sealed after creation#41569
[Data] Partial fix for Dataset.context not being sealed after creation#41569raulchen merged 2 commits intoray-project:masterfrom
Conversation
Signed-off-by: Hao Chen <chenh1024@gmail.com>
Signed-off-by: Hao Chen <chenh1024@gmail.com>
| ds2.take_all() | ||
|
|
||
|
|
||
| def test_streaming_split_with_custom_data_context( |
There was a problem hiding this comment.
Moving this test to test_context_propagation, not changing the test code.
|
What about the places where we use DataContext.get_current() during planning, e.g., here? Don't we need to propagate the DataContext through to those? |
Good point. So this PR can fix the case for training jobs, where be different datasets will be proposed to different processes (the SplitCoordinator actors) for execution. |
stephanie-wang
left a comment
There was a problem hiding this comment.
Looks good, but can you update the PR description to make it clear what cases this does and does not cover?
Why are these changes needed?
Dataset.contextshould be sealed the first time the Dataset is created. But if a new operator is applied to the dataset, the new global DataContext will be saved again to the Dataset.This bug prevents using different DataContexts for training and validation datasets in a training job.
Note this PR only fixes the issue when multiple datasets are created in the process but will be running in different processes. If they run in the same process, it's still a bug, see #41573.
Related issue number
#41573
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.