[Train][Tune] Support reading train result from cloud storage#40622
[Train][Tune] Support reading train result from cloud storage#40622justinvyu merged 12 commits intoray-project:masterfrom
Conversation
This is to support reading Result object from remote storage systems like s3 and gcs.
|
@woshiyyya would you take a look please |
Signed-off-by: Ahmed Mahran <ahmahran@gmail.com>
|
Thanks, this is looking really good! I'll do a review sometime later today. |
justinvyu
left a comment
There was a problem hiding this comment.
Thanks for the great PR 🤩
I have a few comments, and I can help you out with it.
|
Thanks @justinvyu for the thorough review! I'll try to address your comments early next week. |
7c15e5c to
36ce2cd
Compare
|
@justinvyu, I think I've addressed your feedback. |
justinvyu
left a comment
There was a problem hiding this comment.
Thanks! Apart from needing to fix the lint errors, I just have some small comments.
| def _read_file_as_str( | ||
| storage_filesystem: pyarrow.fs.FileSystem, | ||
| storage_path: str, | ||
| compression: Optional[str] = "detect", | ||
| buffer_size: Optional[int] = None, | ||
| encoding: Optional[str] = "utf-8", | ||
| ) -> str: |
There was a problem hiding this comment.
I think I'm okay with adding this helper, but also it might be clearer to just do the filesystem operations directly? We can add this in the future if it turns out we need to read a file as text directly from cloud very often.
There was a problem hiding this comment.
I thought it would be cleaner and clearer to do it this way especially the logic is used twice (for json and csv). I was going to make it inner function in from_path function. I can move as a static method in Result like the old _validate_trial_dir.
justinvyu
left a comment
There was a problem hiding this comment.
Good work! Approved and pending tests passing. 🙌
|
Ok, the
py_test(
name = "test_result",
size = "medium",
srcs = ["tests/test_result.py"],
tags = ["team:ml", "exclusive"],
deps = [":ml_lib", ":conftest"]
) |
ray-project#40622 (comment) The test requires mock_s3_bucket_uri fixture which is in conftest for ray.train.tests
9b052a4 to
7198c11
Compare
|
Sorry, it is |
…oject#40622) This is to support reading Result object from remote/cloud storage systems like s3 and gcs. --------- Signed-off-by: Ahmed Mahran <ahmahran@gmail.com>
This is to support reading Result object from remote/cloud storage systems like s3 and gcs.
Why are these changes needed?
Result can already be stored in remote/cloud storage. There should be a functionality to read it back.
Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.