Release v0.3.0 - Granite 4, Mamba, Env var support, and Memory Estimation · Red-Hat-AI-Innovation-Team/training_hub

This release introduces memory profiling capabilities, enhanced distributed training orchestration, and support for Granite 4 and Mamba models. Backend implementations have been updated to instructlab-training v0.12.1 and mini-trainer v0.3.0.

What's New

Memory Profiling API (Experimental)

New memory estimation tool for fine-tuning workloads
Reports per-GPU VRAM requirements (parameters, optimizer state, gradients, activations, outputs)
Supports both SFT and OSFT algorithms
Returns low/expected/high memory bounds for better resource planning
Includes Liger-kernel-aware adjustments
Example notebook and documentation included

Enhanced Distributed Training

Automatic torchrun configuration from environment variables
Full compatibility with Kubeflow and other orchestration systems
Support for auto and gpu process count specifications
Centralized launch parameter handling with hierarchical priority
Improved validation with clear conflict warnings and error messages
Flexible argument types (string or integer) for multi-node parameters
Explicit master address and port configuration options

Model Support Expansion

Granite 4 support (transformers>=4.57.0)
Mamba model support with optional CUDA acceleration (mamba-ssm[causal-conv1d]>=2.2.5)
Enhanced compatibility through dependency updates

Infrastructure Improvements

Uncapped NumPy for better forward compatibility
Minimum Numba version raised to 0.62.0
Liger kernel pinned to >=0.5.10 for stability
Updated backend implementations (instructlab-training>=0.12.1, rhai-innovation-mini-trainer>=0.3.0)

What's Changed

Pinning liger-kernal version by @Fiona-Waters in #9
Adding min dependencies for Granite 4 / Mamba support by @Maxusmusti in #14
uncap numpy and raise minimum numba version by @RobotSail in #15
Adding basic API for memory profiling (src/training_hub/profiling) by @mazam-lab in #11
feat(traininghub): Use torchrun environment variables for default configuration by @szaher in #13
Update backend implementation dep versions in pyproject.toml by @Maxusmusti in #19

New Contributors

@Fiona-Waters made their first contribution in #9
@mazam-lab made their first contribution in #11
@szaher made their first contribution in #13

Full Changelog: v0.2.0...v0.3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.3.0 - Granite 4, Mamba, Env var support, and Memory Estimation

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's New

Memory Profiling API (Experimental)

Enhanced Distributed Training

Model Support Expansion

Infrastructure Improvements

What's Changed

New Contributors

Contributors

Uh oh!