Skip to content

Commit 40a85f3

Browse files
committed
Add docs for memory efficient init for OSFT
Signed-off-by: Mustafa Eyceoz <[email protected]>
1 parent 9b8505b commit 40a85f3

File tree

2 files changed

+4
-1
lines changed

2 files changed

+4
-1
lines changed

examples/docs/osft_usage.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,7 @@ result = osft(
9090
num_epochs=3,
9191
warmup_steps=100,
9292
use_liger=True,
93+
osft_memory_efficient_init=True, # Recommended for OOMs at model load time
9394
seed=42
9495
)
9596
```
@@ -168,6 +169,7 @@ OSFTAlgorithm = AlgorithmRegistry.get_algorithm('osft')
168169
- `num_epochs` (int): Number of epochs to train for
169170
- `seed` (int): Random seed for training
170171
- `use_liger` (bool): Whether to use Liger kernels for training
172+
- `osft_memory_efficient_init` (bool): Enable memory-efficient initialization to reduce memory usage during model loading (recommended for OOMs)
171173

172174
**Learning Rate Scheduler:**
173175
- `lr_scheduler` (str): Name of the PyTorch learning rate scheduler to use
@@ -266,7 +268,7 @@ result = osft(
266268

267269
1. **unfreeze_rank_ratio**: Start with values between 0.1-0.5. Values >0.5 are rarely needed for general continual-learning regimes.
268270

269-
2. **Memory Management**: OSFT doesn't reduce memory requirements compared to SFT, so adjust `max_tokens_per_gpu` accordingly.
271+
2. **Memory Management**: OSFT doesn't reduce memory requirements compared to SFT, so adjust `max_tokens_per_gpu` accordingly. For memory-constrained environments or OOMs during model loading, set `osft_memory_efficient_init=True`.
270272

271273
3. **Data Processing**: The algorithm handles data processing automatically. Use `use_processed_dataset=True` only if you have pre-tokenized data.
272274

examples/scripts/osft_gpt_oss_example.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,7 @@ def main():
122122

123123
# Optimization
124124
'use_liger': True, # Enable Liger kernels for efficiency
125+
'osft_memory_efficient_init': True, # Recommended for OOMs at model load time
125126
'seed': 42,
126127
'lr_scheduler': 'cosine', # Cosine scheduler works well with OSFT
127128

0 commit comments

Comments
 (0)