Merge pull request #2 from RobotSail/update-docs

Maxusmusti · web-flow · commit 28e52df4e644 · 2025-08-25T11:39:03.000-04:00
update main README to include OSFT
diff --git a/README.md b/README.md
@@ -3,13 +3,13 @@ An algorithm-focused interface for common llm training, continual learning, and
 
 ## Support Matrix
 
-| Algorithm | InstructLab-Training | PEFT | VERL | Status |
-|-----------|---------------------|------|------|--------|
-| **Supervised Fine-tuning (SFT)** | ✅ | - | - | Implemented |
-| Continual Learning (OSFT) | 🔄 | 🔄 | - | Planned |
-| Direct Preference Optimization (DPO) | - | - | 🔄 | Planned |
-| Low-Rank Adaptation (LoRA) | 🔄 | 🔄 | - | Planned |
-| Group Relative Policy Optimization (GRPO) | - | - | 🔄 | Planned |
+| Algorithm | InstructLab-Training | RHAI Innovation Mini-Trainer | PEFT | VERL | Status |
+|-----------|---------------------|---------------|------|------|--------|
+| **Supervised Fine-tuning (SFT)** | ✅ | - | - | - | Implemented |
+| Continual Learning (OSFT) | 🔄 | ✅ | 🔄 | - | Planned |
+| Direct Preference Optimization (DPO) | - | - | - | 🔄 | Planned |
+| Low-Rank Adaptation (LoRA) | 🔄 | - | 🔄 | - | Planned |
+| Group Relative Policy Optimization (GRPO) | - | - | - | 🔄 | Planned |
 
 **Legend:**
 - ✅ Implemented and tested
@@ -18,7 +18,8 @@ An algorithm-focused interface for common llm training, continual learning, and
 
 ## Implemented Algorithms
 
-### [Supervised Fine-tuning (SFT)](examples/sft_usage.md)
+### [Supervised Fine-tuning (SFT)](examples/docs/sft_usage.md)
+
 Fine-tune language models on supervised datasets with support for:
 - Single-node and multi-node distributed training
 - Configurable training parameters (epochs, batch size, learning rate, etc.)
@@ -36,6 +37,32 @@ result = sft(
 )
 ```
 
+### [Orthogonal Subspace Fine-Tuning (OSFT)](examples/docs/osft_usage.md)
+
+OSFT allows you to fine-tune models while controlling how much of its
+existing behavior to preserve. Currently we have support for:
+
+- Single-node and multi-node distributed training
+- Configurable training parameters (epochs, batch size, learning rate, etc.)
+- RHAI Innovation Mini-Trainer backend integration
+
+Here's a quick and minimal way to get started with OSFT:
+
+```python
+from training_hub import osft
+
+result = osft(
+    model_path="/path/to/model",
+    data_path="/path/to/data.jsonl", 
+    ckpt_output_dir="/path/to/outputs",
+    unfreeze_rank_ratio=0.25,
+    effective_batch_size=16,
+    max_tokens_per_gpu=2048,
+    max_seq_len=1024,
+    learning_rate=5e-6,
+)
+```
+
 ## Installation
 
 ### Basic Installation
diff --git a/src/training_hub/algorithms/osft.py b/src/training_hub/algorithms/osft.py
@@ -339,7 +339,7 @@ def execute_training(self, algorithm_params: dict[str, any]) -> any:
         # parameter for performaance gains.
         data_output_dir = algorithm_params.get('data_output_dir', None)
         if data_output_dir is None:
-            data_output_dir = os.path.join(algorithm_params['ckpt_output_dir'], '_internal_data_processing')
+            data_output_dir = os.path.join(algorithm_params['output_dir'], '_internal_data_processing')
         
         # since mini trainer itself does not process data, we delegate this to
         # a separate backend, and expect to receive the correct data path
@@ -373,6 +373,9 @@ def execute_training(self, algorithm_params: dict[str, any]) -> any:
         training_args_pre['osft'] = training_args_pre.get('osft', True)
 
         torchrun_args_pre = {k: v for k, v in algorithm_params.items() if k in torchrun_args_fields and v is not None}
+        # TODO: update this default in mini-trainer
+        torchrun_args_pre['rdzv_endpoint'] = torchrun_args_pre.get('rdzv_endpoint', 'localhost:1738')
+
 
         # now we run training
         return run_training(