Skip to content

Releases: Red-Hat-AI-Innovation-Team/training_hub

v0.3.0 - Granite 4, Mamba, Env var support, and Memory Estimation

14 Oct 21:56
e6c8cca

Choose a tag to compare

This release introduces memory profiling capabilities, enhanced distributed training orchestration, and support for Granite 4 and Mamba models. Backend implementations have been updated to instructlab-training v0.12.1 and mini-trainer v0.3.0.

What's New

Memory Profiling API (Experimental)

  • New memory estimation tool for fine-tuning workloads
  • Reports per-GPU VRAM requirements (parameters, optimizer state, gradients, activations, outputs)
  • Supports both SFT and OSFT algorithms
  • Returns low/expected/high memory bounds for better resource planning
  • Includes Liger-kernel-aware adjustments
  • Example notebook and documentation included

Enhanced Distributed Training

  • Automatic torchrun configuration from environment variables
  • Full compatibility with Kubeflow and other orchestration systems
  • Support for auto and gpu process count specifications
  • Centralized launch parameter handling with hierarchical priority
  • Improved validation with clear conflict warnings and error messages
  • Flexible argument types (string or integer) for multi-node parameters
  • Explicit master address and port configuration options

Model Support Expansion

  • Granite 4 support (transformers>=4.57.0)
  • Mamba model support with optional CUDA acceleration (mamba-ssm[causal-conv1d]>=2.2.5)
  • Enhanced compatibility through dependency updates

Infrastructure Improvements

  • Uncapped NumPy for better forward compatibility
  • Minimum Numba version raised to 0.62.0
  • Liger kernel pinned to >=0.5.10 for stability
  • Updated backend implementations (instructlab-training>=0.12.1, rhai-innovation-mini-trainer>=0.3.0)

What's Changed

  • Pinning liger-kernal version by @Fiona-Waters in #9
  • Adding min dependencies for Granite 4 / Mamba support by @Maxusmusti in #14
  • uncap numpy and raise minimum numba version by @RobotSail in #15
  • Adding basic API for memory profiling (src/training_hub/profiling) by @mazam-lab in #11
  • feat(traininghub): Use torchrun environment variables for default configuration by @szaher in #13
  • Update backend implementation dep versions in pyproject.toml by @Maxusmusti in #19

New Contributors

Full Changelog: v0.2.0...v0.3.0

v0.2.0 - GPT-OSS Support

17 Sep 19:39
8164824

Choose a tag to compare

Both SFT and OSFT now support gpt-oss models, alongside new example scripts, documentation updates, and dependency version adjustments.

What's Changed

  • Update dependencies, examples, and docs for GPT-OSS by @Maxusmusti in #6

Full Changelog: v0.1.0...v0.2.0

v0.1.0 - SFT, OSFT (Continual Learning), and Examples

03 Sep 10:42

Choose a tag to compare

This update includes new docs for OSFT, alongside minor bug fixes and doc amendments.

What's Changed

Full Changelog: v0.1.0a3...v0.1.0

v0.1.0 Alpha 3 - OSFT Param/README updates

25 Aug 15:45
28e52df

Choose a tag to compare

What's Changed

Full Changelog: v0.1.0a2...v0.1.0a3

v0.1.0 Alpha 2 - OSFT (Continual Learning) Functionality

25 Aug 14:31

Choose a tag to compare

What's Changed

  • Add OSFT implementation through mini-trainer by @RobotSail in #1

New Contributors

Full Changelog: v0.1.0a1...v0.1.0a2

v0.1.0 Alpha 1 - Initial Release for Basic SFT Functionality

15 Aug 20:39

Choose a tag to compare

Cutting the first Training Hub alpha release, available on PyPI!

pip install training-hub, pip install training-hub[cuda]

Full Changelog: https://github.com/Red-Hat-AI-Innovation-Team/training_hub/commits/v0.1.0a1