Releases: Red-Hat-AI-Innovation-Team/training_hub
v0.3.0 - Granite 4, Mamba, Env var support, and Memory Estimation
This release introduces memory profiling capabilities, enhanced distributed training orchestration, and support for Granite 4 and Mamba models. Backend implementations have been updated to instructlab-training v0.12.1 and mini-trainer v0.3.0.
What's New
Memory Profiling API (Experimental)
- New memory estimation tool for fine-tuning workloads
- Reports per-GPU VRAM requirements (parameters, optimizer state, gradients, activations, outputs)
- Supports both SFT and OSFT algorithms
- Returns low/expected/high memory bounds for better resource planning
- Includes Liger-kernel-aware adjustments
- Example notebook and documentation included
Enhanced Distributed Training
- Automatic torchrun configuration from environment variables
- Full compatibility with Kubeflow and other orchestration systems
- Support for auto and gpu process count specifications
- Centralized launch parameter handling with hierarchical priority
- Improved validation with clear conflict warnings and error messages
- Flexible argument types (string or integer) for multi-node parameters
- Explicit master address and port configuration options
Model Support Expansion
- Granite 4 support (transformers>=4.57.0)
- Mamba model support with optional CUDA acceleration (mamba-ssm[causal-conv1d]>=2.2.5)
- Enhanced compatibility through dependency updates
Infrastructure Improvements
- Uncapped NumPy for better forward compatibility
- Minimum Numba version raised to 0.62.0
- Liger kernel pinned to >=0.5.10 for stability
- Updated backend implementations (instructlab-training>=0.12.1, rhai-innovation-mini-trainer>=0.3.0)
What's Changed
- Pinning liger-kernal version by @Fiona-Waters in #9
- Adding min dependencies for Granite 4 / Mamba support by @Maxusmusti in #14
- uncap numpy and raise minimum numba version by @RobotSail in #15
- Adding basic API for memory profiling (src/training_hub/profiling) by @mazam-lab in #11
- feat(traininghub): Use torchrun environment variables for default configuration by @szaher in #13
- Update backend implementation dep versions in pyproject.toml by @Maxusmusti in #19
New Contributors
- @Fiona-Waters made their first contribution in #9
- @mazam-lab made their first contribution in #11
- @szaher made their first contribution in #13
Full Changelog: v0.2.0...v0.3.0
v0.2.0 - GPT-OSS Support
Both SFT and OSFT now support gpt-oss models, alongside new example scripts, documentation updates, and dependency version adjustments.
What's Changed
- Update dependencies, examples, and docs for GPT-OSS by @Maxusmusti in #6
Full Changelog: v0.1.0...v0.2.0
v0.1.0 - SFT, OSFT (Continual Learning), and Examples
This update includes new docs for OSFT, alongside minor bug fixes and doc amendments.
What's Changed
- Adds notebooks for OSFT by @RobotSail in #3
Full Changelog: v0.1.0a3...v0.1.0
v0.1.0 Alpha 3 - OSFT Param/README updates
What's Changed
- update main README to include OSFT by @RobotSail in #2
Full Changelog: v0.1.0a2...v0.1.0a3
v0.1.0 Alpha 2 - OSFT (Continual Learning) Functionality
What's Changed
- Add OSFT implementation through mini-trainer by @RobotSail in #1
New Contributors
- @RobotSail made their first contribution in #1
Full Changelog: v0.1.0a1...v0.1.0a2
v0.1.0 Alpha 1 - Initial Release for Basic SFT Functionality
Cutting the first Training Hub alpha release, available on PyPI!
pip install training-hub, pip install training-hub[cuda]
Full Changelog: https://github.com/Red-Hat-AI-Innovation-Team/training_hub/commits/v0.1.0a1