Skip to content

Commit 0217e03

Browse files
authored
Consider task_arena and task_group combination in detail (#1662)
1 parent 0881802 commit 0217e03

File tree

2 files changed

+128
-19
lines changed

2 files changed

+128
-19
lines changed

rfcs/proposed/task_arena_waiting/readme.md

Lines changed: 5 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -106,18 +106,12 @@ The proposal consists of ideas that are not mutually exclusive and can be implem
106106

107107
### 1. Simplify the use of a task group
108108

109-
To address the existing shortcomings of the `task_arena` and `task_group` combination, we could add
110-
a new overload of the `enqueue` method that takes task_group as an argument, and add a `task_arena::wait_for`
111-
method that also takes a task group.
112-
113-
In the simplest possible implementation, it would be just header-based "syntax sugar" for the code
114-
described above:
109+
To address the existing shortcomings of the `task_arena` and `task_group` combination, the `enqueue` method
110+
could take `task_group` as the second argument, and a new method could wait for a task group:
115111
```cpp
116-
ta.enqueue( []{ foo(); }, tg ); // => ta.enqueue(tg.defer([]{ foo(); }));
117-
ta.wait_for( tg ); // => ta.execute([&tg]{ tg.wait(); });
112+
ta.enqueue([]{ foo(); }, tg); // corresponds to: ta.enqueue(tg.defer([]{ foo(); }));
113+
ta.wait_for(tg); // corresponds to: ta.execute([&tg]{ tg.wait(); });
118114
```
119-
If justified performance-wise, a more elaborated implementation could perhaps shave off some overheads.
120-
It would likely require new library entry points, though.
121115
122116
The example code to split work across NUMA-bound task arenas could then look like this (assuming also
123117
a special function that creates and initializes a vector of arenas):
@@ -135,15 +129,7 @@ for(unsigned j = 0; j < numa_arenas.size(); j++) {
135129
}
136130
```
137131

138-
It makes sense to also consider work isolation for this API. While waiting for task group completion,
139-
the thread can take unrelated tasks for execution, and that can potentially result in a delayed return
140-
and in latency increase. To prevent that, the tasks in the group should carry a unique tag that is
141-
also specified for the waiting call. The `isolated_task_group` preview class provides this functionality,
142-
but not the regular `task_group`. We can consider the following options for supporting isolation
143-
in `task_arena::wait_for(task_group&)`:
144-
- keep the `isolated_task_group` class and support it in the proposed `task_arena` extensions;
145-
- somehow extend the `task_group` class to optionally support work isolation (might require incompatible changes);
146-
- add an isolation tag (automatically or on demand) only when a `task_group` is used with `task_arena`.
132+
See [Improve interoperability with task groups](task_group_interop.md) for more details.
147133

148134
### 2. Reconsider waiting for all tasks
149135

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
# Improve interoperability with task groups
2+
3+
## Motivation
4+
5+
As described in the [overarching proposal](readme.md#motivation), combined use of a `task_arena`
6+
and a `task_group` is helpful when there is a need to execute some tasks asynchronously,
7+
but is non-trivial to do properly. Here we propose specific APIs to make it easier.
8+
9+
## Proposed API
10+
11+
We suggest new overloads for `enqueue` which additionally take `task_group` as an argument,
12+
and the new `task_arena::wait_for` method that also takes `task_group`.
13+
14+
```cpp
15+
// Defined in header <oneapi/tbb/task_arena.h>
16+
17+
namespace oneapi::tbb {
18+
class task_arena {
19+
public:
20+
... // public types and members of class task_arena
21+
22+
// Proposed new methods
23+
template<typename F> void enqueue(F&& f, task_group& tg);
24+
task_group_status wait_for(task_group& tg);
25+
};
26+
27+
namespace this_task_arena {
28+
template<typename F> void enqueue(F&& f, task_group& tg);
29+
}
30+
} // namespace oneapi::tbb
31+
```
32+
33+
## Design discussion
34+
35+
### Enqueue a function as a part of a task group
36+
37+
There are two existing methods to submit a task for asynchronous execution in a task arena:
38+
```cpp
39+
template<typename F> void task_arena::enqueue(F&& f); // (1)
40+
void task_arena::enqueue(task_handle&& h); // (2)
41+
```
42+
The `this_task_arena` namespace also has two functions with the same signatures.
43+
44+
The proposed new overload for `enqueue` is similar to (1) but also takes `task_group` as the second argument:
45+
```cpp
46+
template<typename F> void task_arena::enqueue(F&& f, task_group& tg);
47+
```
48+
Semantically it is equivalent to (2) which argument is created by `tg.defer(std::forward<F>(f))`.
49+
Implementation-wise it is just a header-based wrapper over (2); a more elaborated implementation
50+
does not appear necessary. An analogous function should also be added to the `this_task_arena` namespace.
51+
52+
### Wait for completion of a task_group
53+
54+
The new proposed method of `task_arena` takes a `task_group` argument and does not return until
55+
all tasks in that task group are complete or cancelled:
56+
```cpp
57+
task_group_status task_arena::wait_for(task_group& tg);
58+
```
59+
Note that the scope of waiting includes not only the tasks submitted via the methods of `task_arena`
60+
but all tasks in the task group, independent of the way they were created and added as well as of
61+
task arenas they were submitted to.
62+
63+
The returned value indicates the [completion status](
64+
https://oneapi-spec.uxlfoundation.org/specifications/oneapi/v1.4-rev-1/elements/onetbb/source/task_scheduler/task_group/task_group_status_enum)
65+
of the task group.
66+
67+
The method is semantically equivalent to `execute([&tg]{ return tg.wait(); })`, and can be implemented
68+
that way. However, a better implementation for the current code base should instead use the `wait_delegate`
69+
class (see `oneapi/tbb/task_group.h`) and directly call the `execute` library entry point with this delegate.
70+
71+
There is no need to have a similar function in the `this_task_arena` namespace, as it would be
72+
no different from calling `tg.wait()`.
73+
74+
### Should `execute` be extended as well?
75+
76+
Another method, `task_arena::execute` appear similar to `enqueue` in the sense that it also takes a callable
77+
and executes it in the arena. Should it also interoperate with a task group, and in which way?
78+
79+
The purpose of `execute` is to make sure that the provided callable is executed in a certain task arena,
80+
so that any work created by the callable is shared within the arena. To achieve that, the calling thread
81+
attempts to join the arena; if successful, it executes the callable and returns, while if not - which means
82+
that the arena is full with other threads - the callable is wrapped into a task and delegated to those threads,
83+
and the calling thread blocks until the task is complete.
84+
85+
A reasonable interoperability semantics could be that the callable, while executed in the given arena,
86+
also counts as a task in the given group. It would be roughly equivalent to the following code:
87+
```cpp
88+
// auto res = ta.execute(f, tg) could mean:
89+
{
90+
auto th = tg.defer([]{}); // an empty "proxy" task for counting
91+
ta.execute(f);
92+
} // th is destroyed when the thread leaves the scope
93+
```
94+
95+
Note that `ta.execute([&]{ tg.run(f); })` is not suitable because it submits `f` into the arena
96+
but does not ensure its completion, and `ta.execute([&]{ tg.run_and_wait(f); })` does not work either
97+
because it waits for all tasks in the group, not only for `f`.
98+
99+
Overall, it is not obvious if adding a task group parameter to `execute` is a useful extension.
100+
101+
### Thoughts on work isolation
102+
103+
It makes sense to also consider work isolation for this API. While waiting for task group completion,
104+
the thread can take unrelated tasks for execution, and that can potentially result in a delayed return
105+
and in latency increase. To prevent that, tasks in the group should carry a unique tag that is
106+
also specified for the waiting call. The `isolated_task_group` preview class provides the desired
107+
functionality, but not the regular `task_group`.
108+
109+
Note that extending `this_task_arena::isolate` with a task group argument would not help. `isolate`
110+
uses a unique isolation scope for a given callable; its purpose is to isolate the work, which the callable
111+
produces and then waits for, from every other task, and specifically from stealing "outermost" tasks which
112+
interfere with the callable.
113+
114+
We can consider the following options for providing isolation in `task_arena::wait_for(task_group&)`:
115+
- keep the `isolated_task_group` class and support it in the proposed `task_arena` extensions;
116+
- somehow extend the `task_group` class to optionally support work isolation (might require incompatible changes);
117+
- add an isolation tag (automatically or on demand) only when a `task_group` is used with `task_arena`.
118+
119+
## Open Questions
120+
121+
- Is there any value in implementing this proposal first as experimental/preview API?
122+
- Should a new overload for `execute` be added, that takes a task group argument?
123+
- Whether/how work isolation is supported needs to be decided

0 commit comments

Comments
 (0)