Highlights
Deployment
- Link only needed LLVM/Clang libraries
- Avoid static initialization errors when consumer separately links with LLVM by not exporting LLVM symbols from the Proteus runtime library
- Do not install internal implementation API headers by default, add cmake option
PROTEUS_INSTALL_IMPL_HEADERS(default off) to optionally enable - Update spack package recipe
User interface
- Deprecate
proteus::init(),proteus::finalize()(no-ops). Proteus automatically handles initialization and finalization within the runtime library
Caching
- Support partially centralize MPI cache with server-client protocol for writing cache entries, distributed reading from networked FS
- Support fully centralized MPI cache with server-client protocol for both reading/writing cache entries
DSL
- Support launching using
Dim3 - Support references
- Support defining/declaring multiple variables
- Support functional constructs
- Introduce
LLVMCodeBuildercomponent for backend code generation
CPP
- Support launching using
Dim3 - Propagate
ExtraArgsfor CPP compilation to runtime template instantiation
Bugfixes
- Fix registration for CUDA RDC linked binaries in runtime library
What's Changed
- DSL/CPP: Kernel Launch Using Dim3 by @ZwFink in #380
- PJ-DSL: Reference Semantics by @ZwFink in #368
- Link only needed from LLVM and privatize by @ggeorgakoudis in #384
- Centralized MPI Storage Cache by @ZwFink in #354
- PJ-DSL: Define/Declare Multiple Vars by @ZwFink in #386
- Fix LLVM static initialization with shared proteus by @koparasy in #374
- Use pdebug for matrix ci by @ggeorgakoudis in #390
- [Refactor][NFC] Rename src/lib to src/runtime by @ggeorgakoudis in #396
- Separate headers by @ggeorgakoudis in #397
- Replace lazy
ensureInitializedpatterns with explicit initialization by @ZwFink in #394 - Show linter output only in step summary by @ggeorgakoudis in #399
- Add install prefix and hostname to CUDA setup script by @ZwFink in #401
- Sanitize initialization and finalization by @ggeorgakoudis in #402
- [NFC] Separate linked libraries in the cmake for the runtime by @ggeorgakoudis in #404
- Completely Centralized MPI Cache by @ZwFink in #400
- Kernel JIT Tracing by @ZwFink in #403
- Make saveToFileAtomic safe for multiple MPI cache writers by @ZwFink in #406
- PJ-DSL: Functional Constructs by @ZwFink in #405
- Ensure instrumentation of all kernel function registrations by @koparasy in #407
- CPPJitModule: Make Instantiate Propagate
ExtraArgsby @ZwFink in #412 - Fix registration for CUDA RDC linked binaries by @ggeorgakoudis in #413
- Remove CUDA runtime dep and fix proteus linkage issues by @koparasy in #385
- MPI Caches: Make Comm Thread Sleep Between Probes, Simplify Finalization by @ZwFink in #414
- Update spack installation by @ggeorgakoudis in #417
- Fix gitlab CI on matrix by @ggeorgakoudis in #420
- [Refactor][DSL] Introduce LLVMCodeBuilder by @ggeorgakoudis in #419
Full Changelog: v2026.01.0...v2026.03.0