Skip to content

[Data] Elevate num_cpus/gpus and memory as top-level params in most APIs#56419

Merged
richardliaw merged 9 commits intoray-project:masterfrom
owenowenisme:data/make-ray-remote-resource-args-top-level-args
Sep 18, 2025
Merged

[Data] Elevate num_cpus/gpus and memory as top-level params in most APIs#56419
richardliaw merged 9 commits intoray-project:masterfrom
owenowenisme:data/make-ray-remote-resource-args-top-level-args

Conversation

@owenowenisme
Copy link
Member

@owenowenisme owenowenisme commented Sep 10, 2025

Why are these changes needed?

This PR:

  • Add a util method merge_resources_to_ray_remote_args to add reaource args : num_cpus num_gpus memory to ray_remote_args and a test for it.
  • Update read_api.py and dataset.py to elevate num_cpus/gpus and memory as top-level params

Related issue number

Closes #54708

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com>
Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com>
Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com>
Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com>
@owenowenisme owenowenisme force-pushed the data/make-ray-remote-resource-args-top-level-args branch from 24afeca to a1ad9d6 Compare September 11, 2025 03:22
Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com>
@owenowenisme owenowenisme marked this pull request as ready for review September 11, 2025 04:57
@owenowenisme owenowenisme requested a review from a team as a code owner September 11, 2025 04:57
@owenowenisme
Copy link
Member Author

@richardliaw @gvspraveen @alexeykudinkin PTAL, Thanks! 🙏🙏🙏

@ray-gardener ray-gardener bot added data Ray Data-related issues community-contribution Contributed by the community labels Sep 11, 2025
@gvspraveen gvspraveen added the go add ONLY when ready to merge, run all tests label Sep 11, 2025
Comment on lines 1047 to 1051
num_cpus: The number of CPUs to reserve for each parallel map worker.
num_gpus: The number of GPUs to reserve for each parallel map worker. For
example, specify `num_gpus=1` to request 1 GPU for each parallel map
worker.
memory: The heap memory in bytes to reserve for each parallel map worker.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, do we actually want to expose this here?

Copy link
Member Author

@owenowenisme owenowenisme Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, users can set these attributes here, so I think its reasonable to expose here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, will let @alexeykudinkin decide, but i think for things like drop_columns and add_columns probably it's not that necessary. but let's wait to hear from othesr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @alexeykudinkin offline -- to avoid adding unnecessary complexity, let's avoid exposing these as top-level parameters for non-UDF APIs:

  • drop_columns
  • select_columns
  • rename_columns

You can specify a UDF for add_columns, so I think it's okay to keep there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just removed all the changes in non UDF function, thanks!

Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com>
Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com>
@richardliaw richardliaw merged commit 9977ce9 into ray-project:master Sep 18, 2025
5 checks passed
@owenowenisme owenowenisme deleted the data/make-ray-remote-resource-args-top-level-args branch September 19, 2025 00:44
zma2 pushed a commit to zma2/ray that referenced this pull request Sep 23, 2025
…PIs (ray-project#56419)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?
This PR:
- Add a util method `merge_resources_to_ray_remote_args` to add reaource
args : `num_cpus` `num_gpus` `memory` to `ray_remote_args` and a test
for it.
- Update `read_api.py` and `dataset.py` to elevate num_cpus/gpus and
memory as top-level params

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number
Closes ray-project#54708
<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com>
Signed-off-by: Zhiqiang Ma <zhiqiang.ma@intel.com>
ZacAttack pushed a commit to ZacAttack/ray that referenced this pull request Sep 24, 2025
…PIs (ray-project#56419)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?
This PR:
- Add a util method `merge_resources_to_ray_remote_args` to add reaource
args : `num_cpus` `num_gpus` `memory` to `ray_remote_args` and a test
for it.
- Update `read_api.py` and `dataset.py` to elevate num_cpus/gpus and
memory as top-level params


<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number
Closes ray-project#54708
<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com>
Signed-off-by: zac <zac@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Sep 24, 2025
…PIs (#56419)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?
This PR:
- Add a util method `merge_resources_to_ray_remote_args` to add reaource
args : `num_cpus` `num_gpus` `memory` to `ray_remote_args` and a test
for it.
- Update `read_api.py` and `dataset.py` to elevate num_cpus/gpus and
memory as top-level params


<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number
Closes #54708
<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
marcostephan pushed a commit to marcostephan/ray that referenced this pull request Sep 24, 2025
…PIs (ray-project#56419)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?
This PR:
- Add a util method `merge_resources_to_ray_remote_args` to add reaource
args : `num_cpus` `num_gpus` `memory` to `ray_remote_args` and a test
for it.
- Update `read_api.py` and `dataset.py` to elevate num_cpus/gpus and
memory as top-level params

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number
Closes ray-project#54708
<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com>
Signed-off-by: Marco Stephan <marco@magic.dev>
elliot-barn pushed a commit that referenced this pull request Sep 27, 2025
…PIs (#56419)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?
This PR:
- Add a util method `merge_resources_to_ray_remote_args` to add reaource
args : `num_cpus` `num_gpus` `memory` to `ray_remote_args` and a test
for it.
- Update `read_api.py` and `dataset.py` to elevate num_cpus/gpus and
memory as top-level params


<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number
Closes #54708
<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
dstrodtman pushed a commit to dstrodtman/ray that referenced this pull request Oct 6, 2025
…PIs (ray-project#56419)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?
This PR:
- Add a util method `merge_resources_to_ray_remote_args` to add reaource
args : `num_cpus` `num_gpus` `memory` to `ray_remote_args` and a test
for it.
- Update `read_api.py` and `dataset.py` to elevate num_cpus/gpus and
memory as top-level params

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number
Closes ray-project#54708
<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com>
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
jpatra72 added a commit to jpatra72/ray that referenced this pull request Oct 16, 2025
was added in PR ray-project#56588 , but got lost again in a latter PR ray-project#56419 

Signed-off-by: jpatra72 <jyotirmaya72@gmail.com>
jpatra72 added a commit to jpatra72/ray that referenced this pull request Oct 17, 2025
was added in PR ray-project#56588 , but got lost again in a latter PR ray-project#56419 

Signed-off-by: jpatra72 <jyotirmaya72@gmail.com>
bveeramani pushed a commit that referenced this pull request Oct 17, 2025
was added in PR #56588  , but got lost again in a latter PR #56419 

---------

Signed-off-by: jpatra72 <jyotirmaya72@gmail.com>
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
…PIs (ray-project#56419)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?
This PR:
- Add a util method `merge_resources_to_ray_remote_args` to add reaource
args : `num_cpus` `num_gpus` `memory` to `ray_remote_args` and a test
for it.
- Update `read_api.py` and `dataset.py` to elevate num_cpus/gpus and
memory as top-level params


<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number
Closes ray-project#54708
<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com>
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
)

was added in PR ray-project#56588  , but got lost again in a latter PR ray-project#56419 

---------

Signed-off-by: jpatra72 <jyotirmaya72@gmail.com>
xinyuangui2 pushed a commit to xinyuangui2/ray that referenced this pull request Oct 22, 2025
)

was added in PR ray-project#56588  , but got lost again in a latter PR ray-project#56419

---------

Signed-off-by: jpatra72 <jyotirmaya72@gmail.com>
Signed-off-by: xgui <xgui@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Oct 23, 2025
was added in PR #56588  , but got lost again in a latter PR #56419 

---------

Signed-off-by: jpatra72 <jyotirmaya72@gmail.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…PIs (ray-project#56419)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?
This PR:
- Add a util method `merge_resources_to_ray_remote_args` to add reaource
args : `num_cpus` `num_gpus` `memory` to `ray_remote_args` and a test
for it.
- Update `read_api.py` and `dataset.py` to elevate num_cpus/gpus and
memory as top-level params


<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number
Closes ray-project#54708
<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
)

was added in PR ray-project#56588  , but got lost again in a latter PR ray-project#56419 

---------

Signed-off-by: jpatra72 <jyotirmaya72@gmail.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
)

was added in PR ray-project#56588  , but got lost again in a latter PR ray-project#56419

---------

Signed-off-by: jpatra72 <jyotirmaya72@gmail.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
…PIs (ray-project#56419)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?
This PR:
- Add a util method `merge_resources_to_ray_remote_args` to add reaource
args : `num_cpus` `num_gpus` `memory` to `ray_remote_args` and a test
for it.
- Update `read_api.py` and `dataset.py` to elevate num_cpus/gpus and
memory as top-level params

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number
Closes ray-project#54708
<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
)

was added in PR ray-project#56588  , but got lost again in a latter PR ray-project#56419

---------

Signed-off-by: jpatra72 <jyotirmaya72@gmail.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Data] Elevate num_cpus/gpus and memory as top-level params in most APIs

4 participants