Skip to content

Commit 28e52df

Browse files
authored
Merge pull request #2 from RobotSail/update-docs
update main README to include OSFT
2 parents 29790be + bdc1ff2 commit 28e52df

File tree

2 files changed

+39
-9
lines changed

2 files changed

+39
-9
lines changed

README.md

Lines changed: 35 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,13 @@ An algorithm-focused interface for common llm training, continual learning, and
33

44
## Support Matrix
55

6-
| Algorithm | InstructLab-Training | PEFT | VERL | Status |
7-
|-----------|---------------------|------|------|--------|
8-
| **Supervised Fine-tuning (SFT)** || - | - | Implemented |
9-
| Continual Learning (OSFT) | 🔄 | 🔄 | - | Planned |
10-
| Direct Preference Optimization (DPO) | - | - | 🔄 | Planned |
11-
| Low-Rank Adaptation (LoRA) | 🔄 | 🔄 | - | Planned |
12-
| Group Relative Policy Optimization (GRPO) | - | - | 🔄 | Planned |
6+
| Algorithm | InstructLab-Training | RHAI Innovation Mini-Trainer | PEFT | VERL | Status |
7+
|-----------|---------------------|---------------|------|------|--------|
8+
| **Supervised Fine-tuning (SFT)** || - | - | - | Implemented |
9+
| Continual Learning (OSFT) | 🔄 | | 🔄 | - | Planned |
10+
| Direct Preference Optimization (DPO) | - | - | - | 🔄 | Planned |
11+
| Low-Rank Adaptation (LoRA) | 🔄 | - | 🔄 | - | Planned |
12+
| Group Relative Policy Optimization (GRPO) | - | - | - | 🔄 | Planned |
1313

1414
**Legend:**
1515
- ✅ Implemented and tested
@@ -18,7 +18,8 @@ An algorithm-focused interface for common llm training, continual learning, and
1818

1919
## Implemented Algorithms
2020

21-
### [Supervised Fine-tuning (SFT)](examples/sft_usage.md)
21+
### [Supervised Fine-tuning (SFT)](examples/docs/sft_usage.md)
22+
2223
Fine-tune language models on supervised datasets with support for:
2324
- Single-node and multi-node distributed training
2425
- Configurable training parameters (epochs, batch size, learning rate, etc.)
@@ -36,6 +37,32 @@ result = sft(
3637
)
3738
```
3839

40+
### [Orthogonal Subspace Fine-Tuning (OSFT)](examples/docs/osft_usage.md)
41+
42+
OSFT allows you to fine-tune models while controlling how much of its
43+
existing behavior to preserve. Currently we have support for:
44+
45+
- Single-node and multi-node distributed training
46+
- Configurable training parameters (epochs, batch size, learning rate, etc.)
47+
- RHAI Innovation Mini-Trainer backend integration
48+
49+
Here's a quick and minimal way to get started with OSFT:
50+
51+
```python
52+
from training_hub import osft
53+
54+
result = osft(
55+
model_path="/path/to/model",
56+
data_path="/path/to/data.jsonl",
57+
ckpt_output_dir="/path/to/outputs",
58+
unfreeze_rank_ratio=0.25,
59+
effective_batch_size=16,
60+
max_tokens_per_gpu=2048,
61+
max_seq_len=1024,
62+
learning_rate=5e-6,
63+
)
64+
```
65+
3966
## Installation
4067

4168
### Basic Installation

src/training_hub/algorithms/osft.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -339,7 +339,7 @@ def execute_training(self, algorithm_params: dict[str, any]) -> any:
339339
# parameter for performaance gains.
340340
data_output_dir = algorithm_params.get('data_output_dir', None)
341341
if data_output_dir is None:
342-
data_output_dir = os.path.join(algorithm_params['ckpt_output_dir'], '_internal_data_processing')
342+
data_output_dir = os.path.join(algorithm_params['output_dir'], '_internal_data_processing')
343343

344344
# since mini trainer itself does not process data, we delegate this to
345345
# a separate backend, and expect to receive the correct data path
@@ -373,6 +373,9 @@ def execute_training(self, algorithm_params: dict[str, any]) -> any:
373373
training_args_pre['osft'] = training_args_pre.get('osft', True)
374374

375375
torchrun_args_pre = {k: v for k, v in algorithm_params.items() if k in torchrun_args_fields and v is not None}
376+
# TODO: update this default in mini-trainer
377+
torchrun_args_pre['rdzv_endpoint'] = torchrun_args_pre.get('rdzv_endpoint', 'localhost:1738')
378+
376379

377380
# now we run training
378381
return run_training(

0 commit comments

Comments
 (0)