[Data] Propagate driver DataContext to RayTrainWorkers#40116
[Data] Propagate driver DataContext to RayTrainWorkers#40116matthewdeng merged 11 commits intoray-project:masterfrom
DataContext to RayTrainWorkers#40116Conversation
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
|
CI run with ML / RL tests passing: https://buildkite.com/ray-project/oss-ci-build-pr/builds/37993 Going to now revert manual enabling the RL tests trigger. |
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
woshiyyya
left a comment
There was a problem hiding this comment.
Thanks @scottjlee, this solution is cleaner than the previous one.
Also, can you elaborate more on the RLLib Learner issue?
Yeah, the previous implementation, which added a new parameter into |
| # TODO(@justinvyu: fix test and/or deprecate relevant code path) | ||
| @pytest.mark.skip("Mocked execute_async doesn't work as intended") |
There was a problem hiding this comment.
Is this intentional as part of this PR?
There was a problem hiding this comment.
yeah, paired with @justinvyu on this for some time, and we came to the conclusion that the mocking inside the test may need to be updated to be compatible with the fix in this PR, but we couldn't figure it out. I think @justinvyu said he can come back in the future to fix or remove the test, will also let him elaborate
Why are these changes needed?
Second attempt on #39698, which was found to be incompatible with RLLib
Learnerclasses. In this PR, we instead move the logic of passing the driver'sDataContextinto theBackendExecutor, instead of theRayTrainWorkeras previously.Related issue number
Closes #39237
Previous PR: #39698
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.