02 Feb 03:36

tzcnt

19d8206

Latest

This release includes several features to enhance the flexibility of the library.

Pre-Announcement: async stack traces

I'm going on a side quest to get async stack traces working in the debugger. This is still an area of active research, and was only made possible recently by the addition of ScriptedFrameProviders in the LLDB 22 pre-release. You can follow the development and read more in the backtrace branch of tmc-examples. Currently, there is a "fully working" version of the debugger script, including IDE integration, for at least one configuration (VS Code + LLDB DAP + LLDB 22 + Clang/GCC + x86). Contributions are welcome.

New example: pipeline.cpp

This shows a configurable generic parallel processing pipeline. Processing stage functions can be coroutines or regular functions. It makes use of the traits and zero-copy channel functionality introduced in v1.4. pipeline.cpp demonstrates the usage, and there are two generic header implementations: pipeline.hpp and pipeline_fifo.hpp.

(Docs Link) New header: `tmc/traits.hpp`

This header contains 2 logical groups of concepts / type traits:

traits to allow users to distinguish between values, awaitables, and packaged tasks
traits to allow users to distinguish between tmc::task and other kinds of functors

This enables users to write generic handlers for various scenarios, similar to those employed internally within the library. For example, here is a generic handler that can accept any value, awaitable, or packaged task:

template <typename T>
void handler(T&& t) {
  if constexpr (tmc::traits::is_awaitable<T>) {
    process(co_await std::forward<T>(t));
  } else if constexpr (tmc::traits::is_callable<T>) {
    process(std::forward<T>(t)());
  } else {
    process(std::forward<T>(t));
  }
}

(Docs Link) New capability: `tmc::channel` zero-copy

A new awaitable function tmc::chan_tok::pull_zc() has been added. This behaves similarly to pull() but returns a scoped object that holds a reference to the storage slot directly in the channel. When the scoped object's lifetime ends, the referenced object is destroyed and the channel slot can be reused.

Additionally, the tmc::chan_tok::post() and tmc::chan_tok::push() functions have been rewritten to use forwarding construction rather than copy construction.

This combination of features make it possible to use tmc::channel in a completely zero-copy, zero-move manner (with a type that doesn't have a default constructor, copy constructor, or move constructor).

(Docs Link) New free function: `tmc::spawn_clang()`

This is the same as tmc::spawn() but it enables HALO so that you can customize the executor and priority without incurring an allocation. To facilitate this, it takes the executor and priority as optional arguments rather than using the fluent builder pattern.

(Docs Link) New compile option: TMC_NODISCARD_AWAIT

If the user defines TMC_NODISCARD_AWAIT then every await_resume() that returns non-void will also be marked [[nodiscard]].

This has to be done in the library since there's no way to specify it otherwise. Setting [[nodiscard]] on the return type doesn't work and neither does setting it on the coroutine task function declaration.

As a side note, all of the awaitables themselves are already nodiscard. This just adds the option to also make the result of the co_await expression also nodiscard.

New header: `tmc/version.hpp`

This defines the TMC_VERSION, TMC_VERSION_MAJOR, TMC_VERSION_MINOR, and TMC_VERSION_PATCH macros. This has also been backported to prior version branches and can be used to check for feature availability.

Breaking changes

tmc::coro_functor has been made trivially copyable/destructible. Instead of destroying any owned functor in its destructor, it instead destroys the owned functor after it is executed. This makes it a delegate that must be executed exactly once. Since this behavior is unusual and possibly unsuitable for general purpose consumption, it has been moved into the tmc::detail namespace.

This is the work item type used internally when TMC_WORK_ITEM=FUNCORO is defined. However, it's not directly required in any public API, so this is only a breaking change if you were using this type directly in your code. Since the namespace has changed, if you are affected by this, it will be a compile-time error.

The purpose of this change is to prepare for a future transition to a Chase-Lev deque which avoids reclamation synchronization when resizing by leaving alive older task buffers. This allows multiple redundant copies to point to the same functor, as long as only one of them is executed.

Assets 2

02 Feb 02:17

tzcnt

v1.3.2

6e347f5

v1.3.2 - the bugfix backports edition

The objective of this release is to establish a versioning strategy that allows users to choose a particular minor version (e.g. v1.3) and receive patch fixes to that version without any potential regressions that might be introduced by new features. To this end:

a full review was completed on all files in the repository
fixes were applied to main
branches were created for all major versions released so far: v1.0, v1.1, v1.2, v1.3
fixes were backported to each branch where they apply
new tags v1.0.1, v1.1.1, v1.2.1, v1.3.2 were created which include the appropriate fixes

Users now have the option to pin their references to the branch (v1.3) or choose the latest patch tag (v1.3.2). In either case, only non-breaking fixes will be applied to these tagged branches after their initial release. When a new minor release is created, the branch will be v1.4 and the tag will be v1.4.0 so they don't conflict.

New Feature: version.hpp

A version.hpp file which defines TMC_VERSION_ macros has been added (backported to all branches) for users to identify the availability of certain features. The development branch (main) will identify as the next unreleased version (currently v1.4).

Fixes Applied

Issues have been categorized by severity:

🚩 major bug fix
🐞 minor bug fix
🕵️ theoretical correctness fix
📝 documentation improvements

Fixes are cumulative, so any fix that was applied to v1.0 is also in v1.1 and higher.

In v1.0.1:

🚩 [v1.0 backport] strengthen atomics in mutex/semaphore (#176)
🐞 [v1.0 backport] Add check for Count == 0 in post_bulk() / post_bulk_waitable() (#198)
🐞 [v1.0 backport] fix missing return with channel::set_reuse_blocks and EmbedFirstBlock = true
🐞 [v1.0 backport] fix: missing comparison in manual_reset_event::co_set (#180)
🕵️ [v1.0 backport] preload waiter list to avoid potential race condition (#182)
🕵️ [v1.0 backport] strengthen atomic loads in atomic_condvar (#181)
🕵️ [v1.0 backport] tweak barrier/latch constructor cast order (#179)
🕵️ [v1.0 backport] remove unused coroutine_traits from task_unsafe (#193)
🕵️ [v1.0 backport] add missing include to work_item.hpp
📝 [v1.0 backport] document that executor teardown() and destructor must be called from external thread
📝 [v1.0 backport] update iter_adapter doc comment
📝 [v1.0 backport] fix resume_on() awaitable customizer doc comment
📝 [v1.0 backport] document braid cannot be held across a suspend point
📝 [v1.0 backport] better document spawn_many invariants
📝 [v1.0 backport] better document barrier invariants

Also in v1.1.1:

🚩 [v1.1 backport] fix potential use-after-free in tmc::task_promise::final_suspend with HALO (#201)
🚩 [v1.1 backport] add missing executor/priority dispatch to spawn_group (#171)
🐞 [v1.1 backport] fix awaitable_traits::result_type for spawn_group with dynamic size (#185)
🐞 [v1.1 backport] use mixins with resume_on / enter / exit (#196)
📝 [v1.1 backport] update spawn_group / fork_group::reset() doc comments
📝 [v1.1 backport] document that spawn_group::add*() and fork_group::fork*() are not thread-safe (#186)

Also in v1.2.1:

🕵️ [v1.2 backport] strengthen atomic loads in bitmap_object_pool

Also in v1.3.2:

🚩 [v1.3 backport] fix: remove pr_empty flag in ex_cpu queue which was causing lost wakeups (#175)
🚩 [v1.3 backport] fix potential leak in coro_functor move assignment (#191)
🐞 [v1.3 backport] add cpu_kind operator| (#199)
🐞 [v1.3 backport] fix: Split inboxes for threads in the same group that have different allowed priorities (#177)
📝 [v1.3 backport] topology: make the methods of tmc::topology::cpu_topology const (#187)

Assets 2

12 Jan 06:42

tzcnt

v1.3.1

8d65e6c

v1.3.1

Two hotfixes:

#170 - Fix TMC_DEBUG_THREAD_CREATION interaction with hwloc DLL on Windows.
#169 - Increased cacheline padding / alignas on Apple M series CPUs to 128 bytes to reduce false sharing. Results from my M2 Macbook Air have been added to the benchmarks chart.

Assets 2

08 Jan 05:55

tzcnt

v1.3.0

09a6dab

v1.3.0 - the hwloc edition

This release dramatically enhances the runtime hardware detection and thread configuration capabilities of TooManyCooks. This makes it possible to write applications that will scale effortlessly on a variety of systems, including bare-metal monolithic, hybrid, or chiplet architecture CPUs, many-core/NUMA machines, or containers/virtualized environments.

There are several new examples demonstrating these capabilities, located here:

>> Examples Link <<

Enhancements to hwloc integration (with `TMC_USE_HWLOC`)

Prior State in v1.2 - `ex_cpu` Work Stealing Groups (Automatic)

The following was used internally to optimize work stealing, but was not directly visible to the user:
The number of shared L3 caches on the system is detected and thread groups are created according to those L3 caches. If an executor contains multiple such groups, threads prefer to steal work from other threads in their group before looking outside their group to steal. This is most effective on AMD Zen chiplet architectures, which may have many such caches (one per CCD/chiplet) and high latency for inter-chiplet access.

Thread affinity is set so that each thread may run on any core inside its cache group. This prevents expensive cross-cache thread migrations, while allowing some flexibility in scheduling, if there are other threads running on the same system.

(Docs Link) `ex_cpu` Work Stealing Groups Improvements

Different CPU kinds (Performance or Efficiency cores on hybrid CPUs) are detected and will also be treated as independent groups. If an executor contains multiple CPU kinds, threads prefer to steal from other threads with the same CPU kind before stealing from threads running on a different CPU kind.
Caches of any level that are shared among multiple cores can create a group. For example, Apple M processors only expose L2 caches.
Irregular cache hierarchies are handled as well. For example, the Intel 13th gen will have an L3 cache group for the P-cores, and an L2 cache group for each cluster of 4 E-cores.

(Docs Link) New Header `tmc/topology.hpp`

tmc::topology::query() can be called to query the CPU topology and return a view optimized for TMC usage that includes NUMA nodes, cache groups and core counts. It also exposes information about CPU kinds, the number of CPUs of each kind, and the SMT level of each cache group (since often only P-cores have SMT). This disambiguates between P-cores, E-cores, and Low-Power E-cores (as seen on latest gen Intel laptop chips).
New types cpu_topology, core_group, cpu_kind exposed by the topology object
New types thread_pinning_level, thread_packing_strategy, thread_info used by ex_cpu
New type topology_filter used by multiple executors to control where threads are allocated
New function pin_thread to allow users to match external thread affinity to executor affinity

(Docs Link) New Method on `ex_cpu`/`ex_cpu_st`/`ex_asio`

add_partition() allows you to specify which physical cores an executor is allowed to run on (like the taskset command for a single executor). The input to this function is a tmc::topology::topology_filter which can be constructed with information retrieved from the topology query, to specify a specific set of cores, cache groups, or NUMA nodes.

(Docs Link) New Methods on `ex_cpu`

fill_thread_occupancy() fills SMT levels individually for all cores, with awareness of their CPU kind
set_thread_init_hook() / set_thread_teardown_hook() have new overloads that receive a tmc::topology::thread_info struct with info about the thread's group and CPU kind.
set_thread_pinning_level() defaults to GROUP, but allows pinning to CORE (for benchmarks)
set_thread_packing_strategy() controls how threads should be allocated when set_thread_count() is less than the whole system
set_work_stealing_strategy controls the work stealing matrix type

(Docs Link) New Killer Feature: Hybrid Work Steering

For ex_cpu only, add_partition() can be called multiple times to split work between multiple partitions at different priority levels. This can be called with any partition type, but is probably most useful when used to split P- and E-cores. These priority ranges can be overlapping (as shown below) or non-overlapping.

    tmc::topology::topology_filter p_cores;
    p_cores.set_cpu_kinds(tmc::topology::cpu_kind::PERFORMANCE);
    tmc::topology::topology_filter e_cores;
    e_cores.set_cpu_kinds(tmc::topology::cpu_kind::EFFICIENCY1);

    // P-cores handle high (priority 0) and medium (priority 1) work
    // E-cores handle medium (priority 1) and low (priority 2) work
    // Work stealing between core types can happen for priority 1 work
    tmc::cpu_executor()
      .add_partition(p_cores, 0, 2)
      .add_partition(e_cores, 1, 3)
      .set_priority_count(3)
      .init();

(Docs Link) New Debug Compile Flag

If you define the preprocessor macro TMC_DEBUG_THREAD_CREATION, executors will print information about thread groups, affinities, and work stealing matrixes when init() is called.

(Docs Link) Container (cgroups) CPU Quota Detection for `ex_cpu`

ex_cpu will automatically detect if Linux cgroups (v1 or v2) CPU quotas have been configured for the application. If so, it will
create a default number of threads equal to the quota, rounded down to a minimum of 1. This means that if you run with docker run --cpus=2 then 2 threads will be created. This feature does not require TMC_USE_HWLOC and is always active.
If TMC_USE_HWLOC is enabled, hwloc can detect if a specific cpuset was allocated. That is if you run with docker run --cpuset-cpus=0,1 then 2 threads will be created, and the usual optimizations based on CPU cache groupings will apply.
These features also work with Kubernetes, as long as the underlying containerization is implemented using Linux cgroups.
This can be overridden by calling set_thread_count().

(Docs Link) Unlimited Threads Support

By default, ex_cpu uses machine-word-sized bitmaps for thread state tracking. This is highly efficient, but imposes a limit of 32 or 64 threads, based on system word size.

In v1.3, if you define the preprocessor macro TMC_MORE_THREADS, ex_cpu will support an unlimited number of threads. This uses a dynamic bitmap, which does have a small additional performance cost.

(Docs Link) New Executor: `ex_manual_st`

This is an executor that doesn't own any threads. Work posted to this executor will be queued, but not executed until it is polled by calling a run_*() function, during which the calling thread will execute work on behalf of the executor. This can be used to integrate with an external event loop, e.g. a game engine's main loop, and to poll for continuations at a specific time, without needing synchronization with other elements of the loop. Although any number of threads can post work to it simultaneously, only one thread should call run_*() at any given time.

(Docs Link) New Method on `ex_cpu`/`ex_cpu_st`

set_spins() controls how many times executor threads spin looking for work before going to sleep

Optimizations

Optimized ex_cpu thread sleep/wake logic for the most common scenario - when all threads are working and submitting work within their own priority group.
Optimized ex_cpu enqueue/dequeue logic for the most common scenario - when a thread is pushing and popping to its own highest priority queue.
Simplified thread sleep/wake logic overall by preferring to wake threads starting at index 0. In addition to reducing the latency of the thread waking calculation, this also has the effect of reducing data migrations, and making it easier for the OS to schedule external threads efficiently.
Dynamically size the std::atomic::wait() type used to notify threads that there is more work, based on the target platform. On Linux this type should optimally be 4 bytes, but on Windows it should be 8. On MacOS it doesn't matter.

Removed Footguns

Made tmc::task::operator bool() explicit. Having this be be implicit allowed a task to be converted to an int...
Removed tmc::task::done(). This was originally provided for compatibility with the std::coroutine_handle API, but was never useful. Since tmc::task destroys itself on completion, this could never safely return true. The only way (currently) to check if a tmc::task is done is by making use of an awaitable.

Breaking Changes

Executor member functions named task_enter_context() and the traits concept requirement tmc::executor_traits::task_enter_context() have been renamed to dispatch() instead. This makes the naming consistent with asio::dispatch which has the same functionality - resume the work inline if running on the same executor, or post it if coming from a different executor.

This is only a breaking change ...

Assets 2

23 Nov 18:15

tzcnt

v1.2.0

eaf5f6b

v1.2.0 - the warning-free edition

New Features

tmc::ex_cpu_st, an explicitly single-threaded executor. Behaves the same as ex_cpu with .set_thread_count(1), but has better round-trip latency, and better internal execution performance, since it doesn't need internal synchronization like a multi-threaded executor. 📄 ex_cpu_st documentation
tmc::channel::try_pull(), a new function that allows you to poll a channel for data without suspending if it is empty. 📄 try_pull documentation

Enhancements

The library, examples, and tests now build without warnings using pedantic warnings settings on all supported compilers and OSes.

The CI builds now enable pedantic warnings and set -Werror to ensure the project remains clean going forward.

Assets 2

23 Oct 15:48

tzcnt

v1.1.0

62ab8d8

v1.1.0 - the HALO edition

Clang HALO Attributes
tmc::fork_group
tmc::spawn_group
Debug Allocation Counter
On AI

Clang HALO attributes

Heap Allocation eLision Optimization (HALO) is an optimization technique applied by the compiler that allows child task allocations to be combined into the parent’s allocation, eliminating per-task heap allocations for improved performance.

In practice, HALO is often not applied automatically by compilers. However, the Clang compiler, starting with Clang 20, offers some additional attributes [[clang::coro_await_elidable]] and [[clang::coro_await_elidable_argument]] which can be used as a hint to to the compiler that it should apply this optimization. TMC now provides these attributes for several types and functions. On non-Clang compilers or Clang versions prior to 20, these functions are safe to use, but provide no additional optimization.

See the documentation for a full primer: https://fleetcode.com/oss/tmc/docs/v1.1/halo/index.html

I will say that I'm not super happy with the final API; the restrictions imposed by Clang's requirements to actually apply HALO via these attributes resulted in a kind of janky solution for forking. And having to introduce additional functions with special rules on a per-compiler basis is not a great experience for users. But, this is an emerging field of compiler development, so it's the best we have access to at the moment.

`tmc::fork_group`

fork_group is a new type that allows you to initiate individual awaitables immediately, and then await them all at a later time. Multiple different types of awaitables can be dispatched, on different executors/priorities, as long as they all share the same Result type. It offers substantial new flexibility that was not possible previously. For example, the asio_http_server example is now able to track any number of concurrent handler invocations (happening at different times as web requests come in), and ensure they are all finished before exiting, with no memory overhead - only an atomic inc/dec pair per operation.

See the documentation for a full primer: https://fleetcode.com/oss/tmc/docs/v1.1/awaitables/fork_group.html
It also offers a HALO-attributed fork_clang() function: https://fleetcode.com/oss/tmc/docs/v1.1/halo/fork_group.html

`tmc::spawn_group`

spawn_group is a new type that functions similarly to spawn_many(), but rather than requiring an iterator parameter, you can just call add() on the group to append awaitables and then dispatch + await all of them using operator co_await. If you are familiar with Intel TBB, it behaves similarly to tbb::task_group.

See the documentation for a full primer: https://fleetcode.com/oss/tmc/docs/v1.1/awaitables/spawn_group.html
It also offers a HALO-attributed add_clang() function: https://fleetcode.com/oss/tmc/docs/v1.1/halo/spawn_group.html

Debug Allocation Counter

It can be tricky to tell if HALO is actually working, since the compiler doesn't give any feedback. For this purpose, an optional counter has been added which you can use to profile the number of tmc::task allocations in your own code.

See the documentation for a full primer: https://fleetcode.com/oss/tmc/docs/v1.1/halo/index.html#how-can-you-tell-if-halo-is-working

On AI

This is the first release that was developed with the assistance of AI tools. Rest assured, I don't plan to start shipping AI slop any time soon; I remain committed to delivering quality software, with the cleanest possible API, clear documentation, and optimal performance. And I will never ship code that I don't 100% understand. But I also want to ship more features, and faster.

The AI is tremendously fast at generating code, and I already have prototypes for most of the key features of v1.2. At this point, the bottleneck is for me to carefully review and/or rewrite every single line, and obsessively benchmark and profile as always.

I also used AI to generate initial drafts of all the documentation for this release. Although I rewrote everything afterward, I left some of the initial AI-generated structure in - for example the tmc::spawn_group Key Differences, Template Parameters, Result Storage, Imperative Add Interface - that's all AI. It seems like a reasonable inclusion, but there's also the risk of this becoming excessively verbose, as most of this information is documented on the template specializations themselves in the API reference - which is the approach I previously relied on for tmc::spawn_many. Feel free to let me know in the discussions how you feel about this documentation style.

Assets 2

02 Oct 03:12

tzcnt

v1.0.0

e547d63

v1.0.0

Announcing v1.0.0!

This is the first stable release of TooManyCooks. It offers an excellent foundation of performance and features to build on. However, I have quite a few exciting developments planned for the next round of releases, so stay tuned for more.

New Features

Added channel::post_bulk() with 3 overloads that accept iterator/count, begin/end iterator pair, or range types. This function is more efficient than posting in a loop when multiple elements need to be submitted.
Added channel::new_token() which is identical to the token copy constructor, but can be used to more explicitly indicate that a new handle / hazard pointer is being created.
ex_any now implements tmc::detail::executor_traits. You can now use a variable of type tmc::ex_any* to dynamically switch between the executor that is passed to some functions.

Enhancements

When building with TMC_USE_HWLOC, hwloc.h is now only required by the implementation (where TMC_IMPL is defined) and not by the public headers. Thus, you only need to make this file available on the include path for the implementation compilation unit.
When calling post_waitable() and passing an unevaluated coroutine ramp function, a static_assert will disallow this probably erroneous behavior. See #113 for more detail.
Pointer tagging was removed from the implementations of channel and coro_functor.
The entire library is now 32-bit compatible (as a result of the pointer tagging changes).
Implemented ex_braid on top of channel instead of on qu_lockfree. This simplifies the implementation and makes the task list linearizable.

Fixes

Fixed a race condition when shutting down ex_cpu immediately after posting the last task from an external thread.
Fixed a race condition when shutting down ex_braid immediately after posting the last task from an external thread.
Fixed an issue with hwloc CPU grouping on systems that expose no L3 caches (such as Apple M processors).

Compatibility

The library now builds and runs without issues on Visual Studio 2026 Insiders (MSVC Build Tools v145) or newer, as this bug was finally fixed by the MSVC team.

Assets 2

22 May 04:35

tzcnt

v0.0.11

33e2e94

v0.0.11

New Features: Async Control Structures

This release adds a suite of async control structures. They are documented in the new Control Structures section of the documentation site.

TMC async type	equivalent blocking type
tmc::mutex	std::mutex
tmc::semaphore	std::counting_semaphore
tmc::atomic_condvar<T>	std::atomic<T>::wait()
tmc::barrier	std::barrier
tmc::latch	std::latch
tmc::manual_reset_event	Windows ManualResetEvent
tmc::auto_reset_event	Windows AutoResetEvent

Fix: TMC_WORK_ITEM=FUNC

std::function<void()> is now fully supported as the underlying TMC work item type by setting the preprocessor definition TMC_WORK_ITEM=FUNC. The only caveat is that you must also define TMC_TRIVIAL_TASK, as std::function requires its held types to be copyable.

Fix: Always Track Priority

Previously there were some conditions in which a task could end up with an incorrect priority value after transitioning to and from an executor, queue, or event loop that didn't implement priority (such as ex_braid or ex_asio). This has been resolved, and tasks will now always track their priority correctly. This means that even if a task is running on an executor that doesn't actually implement priority:

tmc::current_priority() will always report the correct assigned priority of the task
newly created child tasks will always inherit the correct assigned priority of their parents
when exiting the non-priority executor back to a priority executor, they will be assigned to the correct priority queue

Also implemented in tmc-asio v0.0.11.

As part of this implementation, the number of possible priority levels is now limited to 16. If a task is submitted with priority higher than the maximum value for its executor, its priority will be clamped to the appropriate maximum value.

Enhancement: Awaitable Value Category Propagation

The value category (lvalue or rvalue) of an awaitable is now propagated through to the co_await or async_initiate expression, even if that awaitable is wrapped with spawn() or spawn_tuple().

The rvalue_only_awaitable type already existed to indicate awaitables that should be awaited exactly once, and are consumed afterward. The majority of TMC awaitables fall into this category - and their operator co_await() is decorated with && to enforce it.

A new lvalue_only_awaitable type has been created to indicate the opposite - awaitables that should be awaited multiple times. Currently this is used for the .result_each() awaitable customizer, which produces a sequence of values. These types have their operator co_await() decorated with & to enforce that they are lvalues.

The combination of the above changes is that many invalid behaviors will now simply fail to compile. For example, awaiting an rvalue awaitable twice would cause an error at runtime. Awaiting an lvalue awaitable temporary would also cause an error at runtime. Now that the compiler can detect and block these errors, you can get feedback sooner and avoid painful debugging sessions.

spawn_many() was not touched as part of this - it still always moves-from its iterator. Resolving this will be more difficult as it involves deducing the value category of the pointed-to object (if the iterator is a pointer).

Valid rvalue awaitable usage:

// temporary rvalue
co_await expr();

// explicit rvalue cast required
auto t = expr();
co_await std::move(t);

// wrappers have the same rules - temporary rvalue
co_await spawn_tuple(expr());

// explicit rvalue cast required
auto t = expr();
co_await tmc::spawn_tuple(std::move(t));

Valid lvalue awaitable usage:

// temporary rvalue not allowed, an lvalue must be created
auto t = expr();
co_await t;
co_await t;

// wrappers have the same rules
auto t = expr();
co_await tmc::spawn_tuple(t);
co_await tmc::spawn_tuple(t);

Enhancement: ex_cpu performance

Some improvements were made to the core worker loop and queue functions which reduce code footprint and improve instruction cache locality. This results in a ~1-3% performance improvement on synthetic benchmarks.

Breaking Changes

The rvalue enforcement rules will break code that depended on the previously incorrect behavior that spawn_tuple would accept lvalues (of rvalue awaitables). Now that this behavior has been fixed, you can resolve any compilation issues by applying std::move() to your parameters at the call site.

Assets 2

28 Apr 15:15

tzcnt

v0.0.10

f591aa7

v0.0.10

New Features:
spawn_many() / spawn_func_many() now accept range-types as a single-argument parameter. This should work for any type that exposes the begin() and end() member functions.

// old (still available)
co_await tmc::spawn_many(tasks.begin(), tasks.end());

// new
co_await tmc::spawn_many(tasks);

Fixes:
The primary focus of this release was implementing comprehensive test coverage. During this process, a number of minor issues were identified and fixed. Additionally, a fair bit of legacy code was removed or streamlined.

Breaking Changes:

The compile-time option TMC_CUSTOM_CORO_ALLOC was removed, and has been made the default behavior. I have a desire to reduce the number of configuration parameters, and this leaves open the door to future allocation improvements for tmc::task.
Awaitable type names have been standardized to match the function that produces them, with the aw_ prefix. For example, spawn_many() produces an awaitable type named aw_spawn_many.
"spawn_task.hpp" has been renamed to "spawn.hpp" as it contains the spawn() function which can customize more than just tasks.

Assets 2

10 Apr 15:33

tzcnt

v0.0.9

aed7398

v0.0.9

Improvements:

Some performance improvements in tmc::ex_cpu
Added tmc::channel config option EmbedFirstBlock which stores the first channel data block as a member of the channel. This bypasses one allocation and would be most useful in the case of a short-lived channel that handles a small number of elements.
Implemented exception handling in tmc::wrapper_task which is used internally to wrap unknown awaitables.

Fixes:

Fixed some race conditions in tmc::channel

There are an unusually large number of breaking changes in this release as I work to stabilize the API for the first major release.
Breaking Changes:

run_early() has been renamed to fork() - this is a well known name for this concept, and carries the same connotation that a subsequent join is required
rename each() to result_each() - this is to align with future functions that modify the result type of an awaitable, such as result_share() / result_ref()
tmc::channel::reopen() has been removed. The current implementation was unsound. I'd like to bring this feature back in the future, but it will require a redesign of the close logic.
split out some public API functions that were in detail headers into their own public headers: "current.hpp", "ex_any.hpp", "work_item.hpp"
cleaned up "task.hpp" by moving parts of its implementation into detail headers
tmc::post() has been moved from "task.hpp" to "sync.hpp" where the other post functions live
tmc::external::set_default_executor() has been moved into the root namespace tmc::set_default_executor()
the outdated debug configs TMC_USE_MUTEXQ and TMC_QUEUE_NO_LIFO have been removed

Full Changelog: v0.0.8...v0.0.9

Assets 2

Releases: tzcnt/TooManyCooks

v1.4.0 - the toolbox edition

Pre-Announcement: async stack traces

New example: pipeline.cpp

(Docs Link) New header: tmc/traits.hpp

(Docs Link) New capability: tmc::channel zero-copy

(Docs Link) New free function: tmc::spawn_clang()

(Docs Link) New compile option: TMC_NODISCARD_AWAIT

New header: tmc/version.hpp

Breaking changes

Uh oh!

v1.3.2 - the bugfix backports edition

New Feature: version.hpp

Fixes Applied

In v1.0.1:

Also in v1.1.1:

Also in v1.2.1:

Also in v1.3.2:

Uh oh!

v1.3.1

Uh oh!

v1.3.0 - the hwloc edition

>> Examples Link <<

Enhancements to hwloc integration (with TMC_USE_HWLOC)

Prior State in v1.2 - ex_cpu Work Stealing Groups (Automatic)

(Docs Link) ex_cpu Work Stealing Groups Improvements

(Docs Link) New Header tmc/topology.hpp

(Docs Link) New Method on ex_cpu/ex_cpu_st/ex_asio

(Docs Link) New Methods on ex_cpu

(Docs Link) New Killer Feature: Hybrid Work Steering

(Docs Link) New Debug Compile Flag

(Docs Link) Container (cgroups) CPU Quota Detection for ex_cpu

(Docs Link) Unlimited Threads Support

(Docs Link) New Executor: ex_manual_st

(Docs Link) New Method on ex_cpu/ex_cpu_st

Optimizations

Removed Footguns

Breaking Changes

Uh oh!

v1.2.0 - the warning-free edition

New Features

Enhancements

Uh oh!

v1.1.0 - the HALO edition

Clang HALO attributes

tmc::fork_group

tmc::spawn_group

Debug Allocation Counter

On AI

Uh oh!

v1.0.0

Announcing v1.0.0!

New Features

Enhancements

Fixes

Compatibility

Uh oh!

v0.0.11

New Features: Async Control Structures

Fix: TMC_WORK_ITEM=FUNC

Fix: Always Track Priority

As part of this implementation, the number of possible priority levels is now limited to 16. If a task is submitted with priority higher than the maximum value for its executor, its priority will be clamped to the appropriate maximum value.

Enhancement: Awaitable Value Category Propagation

Valid rvalue awaitable usage:

Valid lvalue awaitable usage:

Enhancement: ex_cpu performance

Breaking Changes

Uh oh!

v0.0.10

Uh oh!

v0.0.9

Uh oh!

(Docs Link) New header: `tmc/traits.hpp`

(Docs Link) New capability: `tmc::channel` zero-copy

(Docs Link) New free function: `tmc::spawn_clang()`

New header: `tmc/version.hpp`

Enhancements to hwloc integration (with `TMC_USE_HWLOC`)

Prior State in v1.2 - `ex_cpu` Work Stealing Groups (Automatic)

(Docs Link) `ex_cpu` Work Stealing Groups Improvements

(Docs Link) New Header `tmc/topology.hpp`

(Docs Link) New Method on `ex_cpu`/`ex_cpu_st`/`ex_asio`

(Docs Link) New Methods on `ex_cpu`

(Docs Link) Container (cgroups) CPU Quota Detection for `ex_cpu`

(Docs Link) New Executor: `ex_manual_st`

(Docs Link) New Method on `ex_cpu`/`ex_cpu_st`

`tmc::fork_group`

`tmc::spawn_group`