Sanitize initialization and finalization#402
Merged
ggeorgakoudis merged 23 commits intomainfrom Feb 12, 2026
Merged
Conversation
- Modify check line order due to reordering initialization
- Remove proteus::init() / proteus::finalize()
- Use JitEngineInfoRegistry at construction of device jit engine - Remove singleton from CompilerAsync to simplify lifetime management and cleanup - Use ShutdownGuard in CompilerAsync for correct cleanup of the thread pool
- Avoids cleanup issues now that finalize is not explicit
- Previous finalization callback could have race at teardown when rank 0 attempts to drain messages from other already finalized ranks. This implementation uses sentinel sends and a shutdown tag to create a teardown protocol using point-to-point communication that is safe during the finalization callback.
- The comm thread on rank 0 needs to synchronize with ALL clients at finalization to avoid teadown of MPI transports until the comm thread exits. This commit introduces an ack tag and a two-phase process akin to a barrier. This ensure that finalize() invoked by the cleanup callback at MPI finalization proceeds synchronously with per-rank finalization.
- The ack approach is not race proof. A barrier to the duped communicator should be safe, so this commit uses a barrier to ensure all ranks and the rank 0 comm thread have finished MPI processing. The sentinel message are still necessary to ensure the rank 0 comm thread has drained the message queues before the barrier.
- Remove shutdown cond var and rely on the sentinel protocol to exit the thread - Repurpose stop() to a join() method - Use blocking MPI_Probe to receive shutdown or data messages - Define Running as a non-atomic variable, it is used only by the main thread - Error out if MPI has finalized before finalize(), this should not be possible - Shutdown now drains all messages from ranks in the comm thread and exits the comm thread after sentinels are received. Rank 0 main threads join on the comm thread. All ranks block on a barrier to ensure finalization is done.
ZwFink
requested changes
Feb 12, 2026
- Avoids deadlock when using the MPI cache deploying with 1 rank
ZwFink
approved these changes
Feb 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The previous init/fini is very unclean with static objects and a loosely communicated requirement to use
proteus::init,proteus::finalizeto avoid issues with async compilation or MPI-based caching.This PR significantly improves init/fini and does not require explicit routines. The main changes are:
proteus::initandproteus::finalizefrom tests.The high number of changed files is due to the updated tests. The change is mostly mechanical for removing the deprecated init/fini calls.
Closes #398