-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[RFC] Advanced Core Type Selection #1917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 2 commits
d9af5f4
100d932
2cdad3b
a2c6f9c
f29be27
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,14 +4,17 @@ | |
|
|
||
| ### Motivation | ||
|
|
||
| By default, oneTBB includes all available core types in a task arena unless explicitly constrained. | ||
| The current oneTBB API allows users to constrain task execution to a single core type using | ||
| `task_arena::constraints::set_core_type(core_type_id)`. While this provides control, it creates limitations for | ||
| real-world applications running on processors with more than two core types (e.g., on a system with performance (P), | ||
| efficient (E), and low power efficient (LP E) cores): | ||
|
|
||
| #### 1. **Flexibility and Resource Utilization** | ||
|
|
||
| Many parallel workloads can execute efficiently on multiple core types. For example: | ||
| While it is often best to allow the OS to use all core types and flexibly schedule threads, some advanced users may find it necessary to constrain scheduling. | ||
| When there are more than two core types, it may be desired to constrain execution to not just a single core type. | ||
| Many parallel workloads can execute efficiently on multiple core types that make up a subset of the available core types. For example: | ||
| - A parallel algorithm with good scalability works well on both P-cores and E-cores | ||
| - Background processing can run on E-cores or LP E-cores depending on availability | ||
| - Mixed workloads benefit from utilizing any available performance-class cores (P or E) | ||
|
|
@@ -27,7 +30,9 @@ Applications often have workloads that don't fit neatly into a single core type | |
|
|
||
| #### 3. **Avoiding Inappropriate Core Selection** | ||
|
|
||
| Without the ability to specify "P-cores OR E-cores (but not LP E-cores)", applications face a dilemma: | ||
| Without the ability to specify "P-cores OR E-cores (but not LP E-cores)" or | ||
| "LP E-cores OR E-cores but not P-cores" applications face dilemmas. | ||
| For example, without being able to specify "P-cores OR E-cores (but not LP E-cores)": | ||
| - **No constraint**: Work might be scheduled on LP E-cores, causing significant performance degradation | ||
| - **P-cores only**: Leaves E-cores idle, reducing parallelism | ||
| - **E-cores only**: Misses opportunities to use faster P-cores when available | ||
|
|
@@ -51,12 +56,21 @@ This forces applications to choose one of these suboptimal strategies: | |
| |----------|------|------| | ||
| | **P-cores only** | Maximum single-threaded performance | Leaves E-cores idle; limited parallelism; higher power | | ||
| | **E-cores only** | Good for parallel workloads | Doesn't utilize P-core performance; excludes LP E-cores | | ||
| | **LP E-cores only** | Minimal power consumption | Severe performance impact for most workloads | | ||
| | **LP E-cores only** | Minimal power consumption | Severe performance impact for some workloads that require large, shared caches. | | ||
| | **No constraint** | Maximum flexibility | May schedule on inappropriate cores (e.g., LP E-cores for compute) | | ||
dnmokhov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| None of these options provide the desired behavior: **"Use P-cores or E-cores, but avoid LP E-cores"** or **"Use any | ||
| efficiency cores (E-core or LP E-core)"**. | ||
|
|
||
| ### Compatibility Requirements | ||
|
|
||
| This proposal must maintain compatibility with previous oneTBB library versions: | ||
| - **API and Backward Compatibility (Old Application + New Library)**: Existing code using the current | ||
| `set_core_type(core_type_id)` API must compile and behave identically with newer oneTBB binaries. | ||
| - **Binary Compatibility (ABI)**: The `task_arena::constraints` struct layout must remain unchanged. | ||
| - **Forward Compatibility (New Application + Old Library)**: Applications compiled with the proposed new functionality | ||
| must be able to handle execution against older oneTBB binaries gracefully, without crashes or undefined behavior. | ||
|
|
||
| ## Proposal | ||
|
|
||
| We propose extending the `task_arena::constraints` API to support specifying multiple acceptable core types, enabling | ||
|
|
@@ -182,11 +196,27 @@ The design ensures full backward compatibility: | |
| | **Behavior** | All existing code paths would preserve exact semantics | | ||
| | **ABI** | No changes to struct size or layout | | ||
|
|
||
| ### Forward Compatibility | ||
|
|
||
| With the `constraints` API being header-only, the unmodified ABI, and no new library entry points, applications | ||
| compiled with the proposed new functionality can handle execution against older oneTBB binaries through runtime | ||
| detection and fallback mechanisms. Runtime detection is achieved using `TBB_runtime_interface_version()`, which allows | ||
| applications to verify that the loaded oneTBB binary supports the new API before attempting to use it. When the runtime | ||
| check indicates an older library version, applications can gracefully fall back to alternative strategies: either using | ||
| all available core types (no constraint) or constraining to a single core type using the existing `set_core_type()` | ||
| API. This approach satisfies the forward compatibility requirement stated in the "Compatibility Requirements" section. | ||
|
|
||
| ### Usage Examples | ||
|
|
||
| Core type capabilities vary by hardware platform, and the benefits of constraining execution are highly | ||
| application-dependent. In most cases, systems with hybrid CPU architecture show reasonable performance without | ||
| additional API calls. However, in some exceptional scenarios, performance may be tuned by specifying preferred | ||
| core types. The following examples demonstrate these advanced use cases. | ||
|
|
||
| #### Example 1: Performance-Class Cores (P or E, not LP E) | ||
|
|
||
| Most compute workloads should avoid LP E-cores but can use either P-cores or E-cores: | ||
| In rare cases, compute-intensive tasks may be scheduled to LP E-cores. To fully prevent this, execution can be | ||
| constrained to P-cores and E-cores. The example shows how to set multiple preferred core types: | ||
|
|
||
| ```cpp | ||
| auto core_types = tbb::info::core_types(); | ||
|
|
@@ -204,7 +234,8 @@ arena.execute([] { | |
|
|
||
| #### Example 2: Adaptive Core Selection | ||
|
|
||
| Different arenas for different workload priorities: | ||
| For applications with well-understood workload characteristics, different arenas may be configured with different core | ||
| type constraints. The example shows how to create arenas for different workload priorities: | ||
|
|
||
| ```cpp | ||
| auto core_types = tbb::info::core_types(); | ||
|
|
@@ -253,6 +284,40 @@ The test infrastructure could generate all possible core type combinations using | |
| | **Runtime impact** | Negligible compared to task scheduling overhead | | ||
| | **Affinity operations** | Linear in number of core types, performed once at arena creation | | ||
|
|
||
| ## Alternatives Considered | ||
|
|
||
| ### Alternative 1: Accept Multiple Constraints Instances | ||
|
|
||
| Instead of modifying the `constraints` struct, introduce a new `task_arena` constructor that accepts a vector of | ||
| `constraints` instances. The arena would compute the union of affinity masks from all provided constraints, enabling | ||
| specification of multiple NUMA nodes and core types in a single arena. | ||
|
|
||
| ```cpp | ||
| // Example usage | ||
| tbb::task_arena arena({ | ||
| tbb::task_arena::constraints{}.set_core_type(core_types[1]), | ||
| tbb::task_arena::constraints{}.set_core_type(core_types[2]) | ||
| }); | ||
| ``` | ||
|
|
||
| **Pros:** | ||
| - More scalable: can extend to any other constraint type and specify multiple platform portions as a unified constraint | ||
| - Reuses existing `constraints` struct without modification | ||
| - Avoids bit-packing, format markers, and special value handling | ||
| - No risk of misinterpretation of existing single core type constraints | ||
|
|
||
| **Cons:** | ||
| - Requires creating multiple `constraints` objects for simple core type combinations | ||
| - Vector of `constraints` instances vs. single integer field with bit-packing creates memory overhead | ||
| - Unclear how to handle conflicting `max_concurrency` or `max_threads_per_core` across instances | ||
|
||
| - Library entry points `constraints_default_concurrency()` and `constraints_threads_per_core()` accept single | ||
| `constraints`; would require new overloads or replacement APIs, affecting ABI | ||
aleksei-fedotov marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| **Future Extensibility Consideration:** This approach naturally extends to other constraint types—if `set_core_types` | ||
| is added, a corresponding `set_numa_ids` function would likely follow. The choice between a vector of `constraints` | ||
| instances versus dedicated multi-value setters affects API consistency and usability: the former provides a unified | ||
| pattern for combining any constraints, while the latter offers more intuitive, type-specific methods. | ||
|
|
||
| ## Open Questions | ||
|
|
||
| 1. **API Naming**: Is `set_core_types` (plural) sufficiently distinct from `set_core_type` (singular)? | ||
|
|
@@ -282,4 +347,5 @@ void clear_core_types(); | |
| - Pros: Simpler logic, easier to extend | ||
| - Cons: Increases struct size, breaks ABI compatibility | ||
|
|
||
| 6. **Info API**: Should `info::core_types()` be extended to return a count instead of/in addition to a vector? | ||
| 6. **Info API**: Should `info::core_types()` be augmented with a method to return a count instead of a vector, e.g., | ||
| `info::num_core_types()`? | ||
Uh oh!
There was an error while loading. Please reload this page.