Skip to content

Sanitize initialization and finalization#402

Merged
ggeorgakoudis merged 23 commits intomainfrom
sanitize-init
Feb 12, 2026
Merged

Sanitize initialization and finalization#402
ggeorgakoudis merged 23 commits intomainfrom
sanitize-init

Conversation

@ggeorgakoudis
Copy link
Collaborator

@ggeorgakoudis ggeorgakoudis commented Feb 11, 2026

The previous init/fini is very unclean with static objects and a loosely communicated requirement to use proteus::init, proteus::finalize to avoid issues with async compilation or MPI-based caching.

This PR significantly improves init/fini and does not require explicit routines. The main changes are:

  1. The JitEngineInfoRegistry lightweight static object that holds registration information for fat binaries, linked binaries, functions, and global variables use by device jit engines. This decouples the construction of jit engines with global ctors that register this information
  2. The removal of ObjectCacheRegistry and the use of std::optional for construction of ObjectCacheChain objects
  3. The implementation of an MPI finalization callback to avoid issues in the MPI cache
  4. Removing proteus::init and proteus::finalize from tests.

The high number of changed files is due to the updated tests. The change is mostly mechanical for removing the deprecated init/fini calls.

Closes #398

@ggeorgakoudis ggeorgakoudis requested a review from ZwFink February 11, 2026 16:28
ggeorgakoudis and others added 6 commits February 11, 2026 09:30
- Previous finalization callback could have race at teardown when rank 0
attempts to drain messages from other already finalized ranks. This
implementation uses sentinel sends and a shutdown tag to create a
teardown protocol using point-to-point communication that is safe during
the finalization callback.
- The comm thread on rank 0 needs to synchronize with ALL clients at
finalization to avoid teadown of MPI transports until the comm thread
exits. This commit introduces an ack tag and a two-phase process akin to
a barrier. This ensure that finalize() invoked by the cleanup callback
at MPI finalization proceeds synchronously with per-rank finalization.
- The ack approach is not race proof. A barrier to the duped
communicator should be safe, so this commit uses a barrier to ensure all
ranks and the rank 0 comm thread have finished MPI processing.
The sentinel message are still necessary to ensure the rank 0 comm
thread has drained the message queues before the barrier.
- Remove shutdown cond var and rely on the sentinel protocol to exit the
thread
- Repurpose stop() to a join() method
- Use blocking MPI_Probe to receive shutdown or data messages
- Define Running as a non-atomic variable, it is used only by the main
thread
- Error out if MPI has finalized before finalize(), this should not be
possible
- Shutdown now drains all messages from ranks in the comm thread and
exits the comm thread after sentinels are received. Rank 0 main threads
join on the comm thread. All ranks block on a barrier to ensure
finalization is done.
- Avoids deadlock when using the MPI cache deploying with 1 rank
@ggeorgakoudis ggeorgakoudis requested a review from ZwFink February 12, 2026 18:34
@ggeorgakoudis ggeorgakoudis merged commit 4759e0d into main Feb 12, 2026
27 checks passed
@ggeorgakoudis ggeorgakoudis deleted the sanitize-init branch February 12, 2026 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Decouple static-init registration state from JitEngine construction

2 participants