Skip to content

Red-Hat-AI-Innovation-Team/mini_trainer

Repository files navigation

Mini Trainer

PR Tests Python 3.11+ License: Apache 2.0 codecov

A lightweight, high-performance training library for efficient fine-tuning of large language models up to 70B parameters.

Mini Trainer Logo

Built for speed, simplicity, and scalability 🚀


✨ Features

  • 🔥 Liger Kernels - Minimized memory footprint through chunked loss computation
  • Smart Batch Packing - Automatic minibatching with numba-optimized LPT algorithm for optimal GPU load balancing
  • 🎯 FSDP2 Support - Native PyTorch distributed training with FullyShardedDataParallel
  • 🚫 Padding-Free - Leverages Flash Attention for efficient computation without padding overhead
  • ♾️ Infinite Sampling - Continuous data streaming without manual epoch configuration
  • 🔬 Orthogonal Subspace Fine-Tuning (OSFT) - Advanced continual learning technique for parameter-efficient training
  • 📊 Flexible Logging - JSONL metrics logging with optional Weights & Biases integration

🔬 Orthogonal Subspace Fine-Tuning (OSFT)

arXiv

Mini Trainer implements Orthogonal Subspace Fine-Tuning (OSFT), a breakthrough continual learning technique that enables models to learn new tasks without catastrophic forgetting. OSFT uses adaptive SVD-based decomposition to intelligently update models in unused parameter subspaces while preserving crucial prior knowledge.

🎥 Learn More

Orthogonal Subspace Learning

Watch our technical deep-dive on Orthogonal Subspace Learning

📚 Resources

🚀 Using OSFT

Enable OSFT in your training runs with the --osft flag:

torchrun --nnodes=1 --nproc-per-node=8 -m mini_trainer.train \
    --model-name-or-path meta-llama/Llama-3.1-8B-Instruct \
    --data-path ./data.jsonl \
    --output-dir ./checkpoints \
    --osft \
    --osft-unfreeze-rank-ratio 0.25  # train the 25% least important parameters

The --osft-unfreeze-rank-ratio parameter controls how much of the model to update (0.0 = everything frozen, 1.0 = full training).


📦 Installation

From PyPI

# Install base package
pip install rhai-innovation-mini-trainer

# Install CUDA dependencies (required for GPU training)
pip install rhai-innovation-mini-trainer[cuda] --no-build-isolation

From Source (Editable)

# Clone the repository
git clone https://github.com/Red-Hat-AI-Innovation-Team/mini_trainer.git
cd mini_trainer

# Install in editable mode
pip install -e .

# Install CUDA dependencies
pip install -e .[cuda] --no-build-isolation

🎯 Usage

Training is orchestrated through the api_train.py module, which provides a programmatic interface for launching training jobs. You can run training using torchrun for distributed setups:

torchrun --nnodes=1 --nproc-per-node=8 -m mini_trainer.train \
    --output-dir ./checkpoints \
    --data-path ./data.jsonl \
    --model-name-or-path meta-llama/Llama-3.1-8B-Instruct \
    --batch-size 128 \
    --max-tokens-per-gpu 128000 \
    --learning-rate 5e-6 \
    --use-liger-kernels

Key Parameters

  • --model-name-or-path - HuggingFace model identifier or local path
  • --data-path - Path to tokenized training data (JSONL format)
  • --batch-size - Target batch size for training
  • --max-tokens-per-gpu - Maximum tokens per GPU (auto-balances minibatches)
  • --output-dir - Directory for checkpoints and logs
  • --use-liger-kernels - Enable memory-efficient Liger kernels
  • --osft - Enable Orthogonal Subspace Fine-Tuning mode
  • --osft-unfreeze-rank-ratio - Ratio of model parameters to train with OSFT (0.0-1.0)

For the complete list of arguments and advanced configuration options, see src/mini_trainer/api_train.py.


📊 Data Format

Mini Trainer expects pre-tokenized data in JSONL format with the following structure:

{"input_ids": [1, 2, 3, ...], "labels": [1, 2, 3, ...], "len": 128}
{"input_ids": [4, 5, 6, ...], "labels": [-100, -100, 6, ...], "len": 256}

Each line should contain:

  • input_ids - Tokenized input sequence
  • labels - Target labels (use -100 for tokens to ignore in loss computation)
  • len - Sequence length (optional, computed automatically if missing)

🔄 Data Processing

Mini Trainer does not include data processing utilities. For tokenization and data preparation, please use the instructlab-training APIs, which provide robust data processing pipelines compatible with Mini Trainer's input format.


🐛 Bug Reports & Issues

Found a bug or have a feature request? We'd love to hear from you! Please open an issue on GitHub with:

  • A clear description of the problem
  • Steps to reproduce
  • Expected vs. actual behavior
  • Environment details (Python version, GPU type, etc.)

📝 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.


🙏 Acknowledgments

Built with ❤️ by the Red Hat AI Innovation Team.

Mini Trainer is part of a broader ecosystem of LLM tools developed by the AI Innovation Team. Check out our other projects:

  • training_hub - Post-training algorithms for LLMs
  • its_hub - Inference-time scaling for LLMs
  • sdg_hub - Synthetic data generation pipelines
  • reward_hub - State-of-the-art reward models

Visit ai-innovation.team to explore all our open-source tools and research.

Special thanks to the open-source community for contributions and feedback!

About

fast trainer for educational purposes

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 5

Languages