-
Notifications
You must be signed in to change notification settings - Fork 7.3k
[Data] Streamline concurrency parameter semantic #57035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
2de5a9b
4db498d
a36c708
78e8a66
8d2efb6
c01164e
473aa47
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -574,13 +574,6 @@ def get_compute_strategy( | |
| ) | ||
|
|
||
| if compute is not None: | ||
| # Legacy code path to support `compute` argument. | ||
| logger.warning( | ||
| "The argument ``compute`` is deprecated in Ray 2.9. Please specify " | ||
| "argument ``concurrency`` instead. For more information, see " | ||
| "https://docs.ray.io/en/master/data/transforming-data.html#" | ||
| "stateful-transforms." | ||
| ) | ||
| if is_callable_class and ( | ||
| compute == "tasks" or isinstance(compute, TaskPoolStrategy) | ||
| ): | ||
|
|
@@ -599,6 +592,13 @@ def get_compute_strategy( | |
| ) | ||
| return compute | ||
| elif concurrency is not None: | ||
| # Legacy code path to support `concurrency` argument. | ||
| logger.warning( | ||
| "The argument ``concurrency`` is deprecated in Ray 2.51. Please specify " | ||
owenowenisme marked this conversation as resolved.
Show resolved
Hide resolved
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure if this will be included in 2.50, so set to 2.51 for now.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes this will be 2.51 |
||
| "argument ``compute`` instead. For more information, see " | ||
| "https://docs.ray.io/en/master/data/transforming-data.html#" | ||
| "stateful-transforms." | ||
| ) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| if isinstance(concurrency, tuple): | ||
| # Validate tuple length and that all elements are integers | ||
| if len(concurrency) not in (2, 3) or not all( | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -341,7 +341,12 @@ def parse_filename(row: Dict[str, Any]) -> Dict[str, Any]: | |
| Args: | ||
| fn: The function to apply to each row, or a class type | ||
| that can be instantiated to create such a callable. | ||
| compute: This argument is deprecated. Use ``concurrency`` argument. | ||
| compute: The compute strategy to use for the map operation. | ||
| * If ``compute`` is not specified, will use ``ray.data.TaskPoolStrategy()`` to launch concurrent tasks based on the available resources and number of input blocks. | ||
| * Use ``ray.data.TaskPoolStrategy(size=n)`` to launch a ``n`` concurrent Ray tasks. | ||
owenowenisme marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * Use ``ray.data.ActorPoolStrategy(size=n)`` to use a fixed size actor pool of ``n`` workers. | ||
| * Use ``ray.data.ActorPoolStrategy(min_size=m, max_size=n)`` to use an autoscaling actor pool from ``m`` to ``n`` workers. | ||
| * Use ``ray.data.ActorPoolStrategy(min_size=m, max_size=n, initial_size=initial)`` to use an autoscaling actor pool from ``m`` to ``n`` workers, with an initial size of ``initial``. | ||
|
Comment on lines
344
to
354
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| fn_args: Positional arguments to pass to ``fn`` after the first argument. | ||
| These arguments are top-level arguments to the underlying Ray task. | ||
| fn_kwargs: Keyword arguments to pass to ``fn``. These arguments are | ||
|
|
@@ -357,27 +362,7 @@ def parse_filename(row: Dict[str, Any]) -> Dict[str, Any]: | |
| example, specify `num_gpus=1` to request 1 GPU for each parallel map | ||
| worker. | ||
| memory: The heap memory in bytes to reserve for each parallel map worker. | ||
| concurrency: The semantics of this argument depend on the type of ``fn``: | ||
|
|
||
| * If ``fn`` is a function and ``concurrency`` isn't set (default), the | ||
| actual concurrency is implicitly determined by the available | ||
| resources and number of input blocks. | ||
|
|
||
| * If ``fn`` is a function and ``concurrency`` is an int ``n``, Ray Data | ||
| launches *at most* ``n`` concurrent tasks. | ||
|
|
||
| * If ``fn`` is a class and ``concurrency`` is an int ``n``, Ray Data | ||
| uses an actor pool with *exactly* ``n`` workers. | ||
|
|
||
| * If ``fn`` is a class and ``concurrency`` is a tuple ``(m, n)``, Ray | ||
| Data uses an autoscaling actor pool from ``m`` to ``n`` workers. | ||
|
|
||
| * If ``fn`` is a class and ``concurrency`` is a tuple ``(m, n, initial)``, Ray | ||
| Data uses an autoscaling actor pool from ``m`` to ``n`` workers, with an initial size of ``initial``. | ||
|
|
||
| * If ``fn`` is a class and ``concurrency`` isn't set (default), this | ||
| method raises an error. | ||
|
|
||
| concurrency: This argument is deprecated. Use ``compute`` argument. | ||
| ray_remote_args_fn: A function that returns a dictionary of remote args | ||
| passed to each map worker. The purpose of this argument is to generate | ||
| dynamic arguments for each actor/task, and will be called each time prior | ||
|
|
@@ -590,7 +575,12 @@ def __call__(self, batch: Dict[str, np.ndarray]) -> Dict[str, np.ndarray]: | |
| The actual size of the batch provided to ``fn`` may be smaller than | ||
| ``batch_size`` if ``batch_size`` doesn't evenly divide the block(s) sent | ||
| to a given map task. Default ``batch_size`` is ``None``. | ||
| compute: This argument is deprecated. Use ``concurrency`` argument. | ||
| compute: The compute strategy to use for the map operation. | ||
| * If ``compute`` is not specified, will use ``ray.data.TaskPoolStrategy()`` to launch concurrent tasks based on the available resources and number of input blocks. | ||
| * Use ``ray.data.TaskPoolStrategy(size=n)`` to launch a ``n`` concurrent Ray tasks. | ||
| * Use ``ray.data.ActorPoolStrategy(size=n)`` to use a fixed size actor pool of ``n`` workers. | ||
| * Use ``ray.data.ActorPoolStrategy(min_size=m, max_size=n)`` to use an autoscaling actor pool from ``m`` to ``n`` workers. | ||
| * Use ``ray.data.ActorPoolStrategy(min_size=m, max_size=n, initial_size=initial)`` to use an autoscaling actor pool from ``m`` to ``n`` workers, with an initial size of ``initial``. | ||
| batch_format: If ``"default"`` or ``"numpy"``, batches are | ||
| ``Dict[str, numpy.ndarray]``. If ``"pandas"``, batches are | ||
| ``pandas.DataFrame``. If ``"pyarrow"``, batches are | ||
|
|
@@ -620,27 +610,7 @@ def __call__(self, batch: Dict[str, np.ndarray]) -> Dict[str, np.ndarray]: | |
| example, specify `num_gpus=1` to request 1 GPU for each parallel map | ||
| worker. | ||
| memory: The heap memory in bytes to reserve for each parallel map worker. | ||
| concurrency: The semantics of this argument depend on the type of ``fn``: | ||
|
|
||
| * If ``fn`` is a function and ``concurrency`` isn't set (default), the | ||
| actual concurrency is implicitly determined by the available | ||
| resources and number of input blocks. | ||
|
|
||
| * If ``fn`` is a function and ``concurrency`` is an int ``n``, Ray Data | ||
| launches *at most* ``n`` concurrent tasks. | ||
|
|
||
| * If ``fn`` is a class and ``concurrency`` is an int ``n``, Ray Data | ||
| uses an actor pool with *exactly* ``n`` workers. | ||
|
|
||
| * If ``fn`` is a class and ``concurrency`` is a tuple ``(m, n)``, Ray | ||
| Data uses an autoscaling actor pool from ``m`` to ``n`` workers. | ||
|
|
||
| * If ``fn`` is a class and ``concurrency`` is a tuple ``(m, n, initial)``, Ray | ||
| Data uses an autoscaling actor pool from ``m`` to ``n`` workers, with an initial size of ``initial``. | ||
|
|
||
| * If ``fn`` is a class and ``concurrency`` isn't set (default), this | ||
| method raises an error. | ||
|
|
||
| concurrency: This argument is deprecated. Use ``compute`` argument. | ||
| ray_remote_args_fn: A function that returns a dictionary of remote args | ||
| passed to each map worker. The purpose of this argument is to generate | ||
| dynamic arguments for each actor/task, and will be called each time prior | ||
|
|
@@ -1304,7 +1274,12 @@ def duplicate_row(row: Dict[str, Any]) -> List[Dict[str, Any]]: | |
| Args: | ||
| fn: The function or generator to apply to each record, or a class type | ||
| that can be instantiated to create such a callable. | ||
| compute: This argument is deprecated. Use ``concurrency`` argument. | ||
| compute: The compute strategy to use for the map operation. | ||
| * If ``compute`` is not specified, will use ``ray.data.TaskPoolStrategy()`` to launch concurrent tasks based on the available resources and number of input blocks. | ||
| * Use ``ray.data.TaskPoolStrategy(size=n)`` to launch a ``n`` concurrent Ray tasks. | ||
| * Use ``ray.data.ActorPoolStrategy(size=n)`` to use a fixed size actor pool of ``n`` workers. | ||
| * Use ``ray.data.ActorPoolStrategy(min_size=m, max_size=n)`` to use an autoscaling actor pool from ``m`` to ``n`` workers. | ||
| * Use ``ray.data.ActorPoolStrategy(min_size=m, max_size=n, initial_size=initial)`` to use an autoscaling actor pool from ``m`` to ``n`` workers, with an initial size of ``initial``. | ||
| fn_args: Positional arguments to pass to ``fn`` after the first argument. | ||
| These arguments are top-level arguments to the underlying Ray task. | ||
| fn_kwargs: Keyword arguments to pass to ``fn``. These arguments are | ||
|
|
@@ -1320,27 +1295,7 @@ def duplicate_row(row: Dict[str, Any]) -> List[Dict[str, Any]]: | |
| example, specify `num_gpus=1` to request 1 GPU for each parallel map | ||
| worker. | ||
| memory: The heap memory in bytes to reserve for each parallel map worker. | ||
| concurrency: The semantics of this argument depend on the type of ``fn``: | ||
|
|
||
| * If ``fn`` is a function and ``concurrency`` isn't set (default), the | ||
| actual concurrency is implicitly determined by the available | ||
| resources and number of input blocks. | ||
|
|
||
| * If ``fn`` is a function and ``concurrency`` is an int ``n``, Ray Data | ||
| launches *at most* ``n`` concurrent tasks. | ||
|
|
||
| * If ``fn`` is a class and ``concurrency`` is an int ``n``, Ray Data | ||
| uses an actor pool with *exactly* ``n`` workers. | ||
|
|
||
| * If ``fn`` is a class and ``concurrency`` is a tuple ``(m, n)``, Ray | ||
| Data uses an autoscaling actor pool from ``m`` to ``n`` workers. | ||
|
|
||
| * If ``fn`` is a class and ``concurrency`` is a tuple ``(m, n, initial)``, Ray | ||
| Data uses an autoscaling actor pool from ``m`` to ``n`` workers, with an initial size of ``initial``. | ||
|
|
||
| * If ``fn`` is a class and ``concurrency`` isn't set (default), this | ||
| method raises an error. | ||
|
|
||
| concurrency: This argument is deprecated. Use ``compute`` argument. | ||
| ray_remote_args_fn: A function that returns a dictionary of remote args | ||
| passed to each map worker. The purpose of this argument is to generate | ||
| dynamic arguments for each actor/task, and will be called each time | ||
|
|
@@ -1451,33 +1406,18 @@ def filter( | |
| fn_constructor_kwargs: Keyword arguments to pass to ``fn``'s constructor. | ||
| This can only be provided if ``fn`` is a callable class. These arguments | ||
| are top-level arguments in the underlying Ray actor construction task. | ||
| compute: This argument is deprecated. Use ``concurrency`` argument. | ||
| compute: The compute strategy to use for the map operation. | ||
| * If ``compute`` is not specified, will use ``ray.data.TaskPoolStrategy()`` to launch concurrent tasks based on the available resources and number of input blocks. | ||
| * Use ``ray.data.TaskPoolStrategy(size=n)`` to launch a ``n`` concurrent Ray tasks. | ||
| * Use ``ray.data.ActorPoolStrategy(size=n)`` to use a fixed size actor pool of ``n`` workers. | ||
| * Use ``ray.data.ActorPoolStrategy(min_size=m, max_size=n)`` to use an autoscaling actor pool from ``m`` to ``n`` workers. | ||
| * Use ``ray.data.ActorPoolStrategy(min_size=m, max_size=n, initial_size=initial)`` to use an autoscaling actor pool from ``m`` to ``n`` workers, with an initial size of ``initial``. | ||
| num_cpus: The number of CPUs to reserve for each parallel map worker. | ||
| num_gpus: The number of GPUs to reserve for each parallel map worker. For | ||
| example, specify `num_gpus=1` to request 1 GPU for each parallel map | ||
| worker. | ||
| memory: The heap memory in bytes to reserve for each parallel map worker. | ||
| concurrency: The semantics of this argument depend on the type of ``fn``: | ||
|
|
||
| * If ``fn`` is a function and ``concurrency`` isn't set (default), the | ||
| actual concurrency is implicitly determined by the available | ||
| resources and number of input blocks. | ||
|
|
||
| * If ``fn`` is a function and ``concurrency`` is an int ``n``, Ray Data | ||
| launches *at most* ``n`` concurrent tasks. | ||
|
|
||
| * If ``fn`` is a class and ``concurrency`` is an int ``n``, Ray Data | ||
| uses an actor pool with *exactly* ``n`` workers. | ||
|
|
||
| * If ``fn`` is a class and ``concurrency`` is a tuple ``(m, n)``, Ray | ||
| Data uses an autoscaling actor pool from ``m`` to ``n`` workers. | ||
|
|
||
| * If ``fn`` is a class and ``concurrency`` is a tuple ``(m, n, initial)``, Ray | ||
| Data uses an autoscaling actor pool from ``m`` to ``n`` workers, with an initial size of ``initial``. | ||
|
|
||
| * If ``fn`` is a class and ``concurrency`` isn't set (default), this | ||
| method raises an error. | ||
|
|
||
| concurrency: This argument is deprecated. Use ``compute`` argument. | ||
| ray_remote_args_fn: A function that returns a dictionary of remote args | ||
| passed to each map worker. The purpose of this argument is to generate | ||
| dynamic arguments for each actor/task, and will be called each time | ||
|
|
||

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If users are required to use this API, this should be a
PublicAPI.Same with
ActorPoolStrategyThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can we add both of these to the API reference?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.