Update dependencies, examples, and docs for GPT-OSS #6

Maxusmusti · 2025-09-17T18:55:17Z

Adds two new example scripts, updated documentation, and reorganized dependencies for GPT-OSS support

Summary by CodeRabbit

New Features
- Added memory-efficient initialization option for OSFT.
- Introduced example scripts showcasing OSFT and SFT training for GPT-OSS 20B.
Documentation
- Updated Support Matrix to mark Continual Learning (OSFT) as Implemented.
- Expanded OSFT usage guide with the new memory-efficient option, examples, and best practices.
- Updated examples overview to include new GPT-OSS 20B scripts.
Chores
- Upgraded core dependencies and added Transformers; widened NumPy compatibility.
- Reorganized optional dependencies, introducing a new GPU-focused group.
- Added pytest to dependencies.

Signed-off-by: Mustafa Eyceoz <[email protected]>

coderabbitai · 2025-09-17T18:55:25Z

Walkthrough

Updates documentation to mark OSFT as implemented, adds two GPT-OSS 20B example scripts (SFT and OSFT) with CLI and checkpoint handling, documents a new OSFT parameter (osft_memory_efficient_init), and revises dependencies in pyproject.toml including new/updated packages and optional group changes.

Changes

Cohort / File(s)	Summary of Changes
Support Matrix `README.md`	Marked Continual Learning (OSFT) as Implemented in the support matrix; no other matrix changes.
Examples Index `examples/README.md`	Added entries linking to new GPT-OSS 20B SFT and OSFT scripts; minor formatting tweak around existing SFT line.
OSFT Docs & Params `examples/docs/osft_usage.md`	Documented new boolean parameter `osft_memory_efficient_init`; updated usage example, parameter reference, and best practices.
OSFT GPT-OSS 20B Script `examples/scripts/osft_gpt_oss_example.py`	New script demonstrating OSFT for GPT-OSS 20B: CLI, training parameter assembly, distributed settings, runtime timing, error handling, and `find_most_recent_checkpoint` utility.
SFT GPT-OSS 20B Script `examples/scripts/sft_gpt_oss_example.py`	New script demonstrating SFT for GPT-OSS 20B: CLI, predefined hyperparameters, distributed config, timing, and reporting.
Dependencies & Optionals `pyproject.toml`	Upgraded/core deps: `instructlab-training>=0.12.0`, `rhai-innovation-mini-trainer>=0.2.0`, `transformers>=4.55.0`, `datasets>=4.0.0`, `numpy<2.3`; added `pytest>=8.0`. Removed optional `cuda` group; reworked `gpt-oss` optional group with CUDA-enabled packages and related libs.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor U as User
  participant CLI as CLI Args
  participant Script as SFT/OSFT Script
  participant Hub as training_hub.sft/osft
  participant Trainer as Trainer/Backend
  participant Store as Checkpoint Storage

  U->>CLI: Provide --data-path, --ckpt-output-dir, ...
  CLI->>Script: Parse args
  Script->>Script: Build params (model, batch, dist, OSFT/SFT)
  Script->>Hub: Invoke sft()/osft()(params)
  Hub->>Trainer: Initialize & run training
  Trainer-->>Store: Write checkpoints
  Script->>Script: find_most_recent_checkpoint(output_dir)
  Store-->>Script: Latest checkpoint path
  Script-->>U: Print duration and checkpoint

  rect rgba(220,255,220,0.3)
    Note right of Script: New: OSFT example script and checkpoint finder
    Note right of Hub: New doc param osft_memory_efficient_init (OSFT)
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I tuned my ears to OSS winds tonight,
With SFT carrots stacked just right. 🥕
OSFT trails through checkpoints gleam,
I hop through epochs, swift as a beam.
New deps burrow snug and tight—
Ship it, nibble, train in delight!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title concisely and accurately summarizes the primary changes in the changeset—dependency updates (pyproject.toml), new/updated example scripts, and documentation edits targeting GPT-OSS—so it conveys the main intent to a reviewer scanning history. It is specific to GPT-OSS and not overly vague or noisy.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch update-gpt-oss

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Mustafa Eyceoz <[email protected]>

RobotSail

A few comments but otherwise LGTM

examples/scripts/osft_gpt_oss_example.py

pyproject.toml

Co-authored-by: Oleg Silkin <[email protected]>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (15)

examples/docs/osft_usage.md (2)

145-173: Deduplicate/relocate parameter docs for clarity.

unfreeze_rank_ratio appears under Required Parameters and again under OSFT‑Specific Parameters. Keep it only under Required and move the range guidance there to avoid conflicting edits later.

269-273: Tighten the memory guidance wording.

Clarify that memory‑efficient init helps at load time, not step time. Suggested tweak:
-2. **Memory Management**: OSFT doesn't reduce memory requirements compared to SFT, so adjust `max_tokens_per_gpu` accordingly. For memory-constrained environments or OOMs during model loading, set `osft_memory_efficient_init=True`.
+2. **Memory Management**: OSFT does not reduce step‑time memory vs SFT. Tune `max_tokens_per_gpu` to avoid OOMs during training. For OOMs while loading the base model, enable `osft_memory_efficient_init=True` (affects load‑time memory only).

pyproject.toml (3)

39-46: Duplicate dependency: filelock listed twice.

Drop the duplicate to avoid resolver noise.
     "filelock>=3.0",
     "psutil>=6.0",
     "urllib3>=2.4",
     "frozenlist>=1.7",
     "xxhash>=3.0",
     "requests>=2.32.5",
     "attr>=0.3.2",
-    "filelock>=3.19.1",
     "mpmath>=1.3.0",
     "pytest>=8.0"
47-49: Move pytest to dev extras (avoid shipping test runner to end users).
-    "mpmath>=1.3.0",
-    "pytest>=8.0"
+    "mpmath>=1.3.0"
 ]
@@
 dev = [
     "ipykernel",
-    "ipython"
+    "ipython",
+    "pytest>=8.0"
 ]
14-17: Build tools in runtime deps.

wheel (and usually setuptools) shouldn’t be a runtime requirement given they’re already in [build-system]. Keep packaging if used at runtime.
 dependencies = [
-    "setuptools>=80.0",
     "packaging>=24.2",
-    "wheel>=0.43",
+    # wheel/setuptools are provided via [build-system]

examples/scripts/sft_gpt_oss_example.py (5)

1-1: Shebang mismatch with file perms.

Either make the file executable or drop the shebang to appease linters.

-#!/usr/bin/env python3

17-22: Remove unused imports.

os and datetime aren’t used.

-import os
 import sys
 import time
-from datetime import datetime
 import argparse

61-61: Drop unused result assignment.

-        result = sft(
+        sft(

74-79: Avoid hardcoding /dev/shm; make it configurable with a safe default.

@@
-    parser.add_argument('--nproc-per-node', type=int, default=8,
+    parser.add_argument('--nproc-per-node', type=int, default=8,
                        help='Number of GPUs (default: 8)')
+    parser.add_argument('--data-output-dir', default=None,
+                       help='Optional processed data dir (default: system temp). Use /dev/shm on Linux for speed.')
@@
-            data_output_dir="/dev/shm",        # Use RAM disk for speed
+            data_output_dir=(args.data_output_dir or os.getenv("TMPDIR") or "/tmp"),

Add import os back if you apply this change.

103-114: Catching broad Exception; prefer specific exceptions first.

-    except Exception as e:
+    except (ValueError, RuntimeError) as e:
         end_time = time.time()
         duration = end_time - start_time
@@
-        sys.exit(1)
+        sys.exit(1)
+    except Exception as e:
+        # Fallback catch-all to surface unexpected errors cleanly
+        print("=" * 50)
+        print(f"Unexpected error: {e}")
+        sys.exit(1)

examples/scripts/osft_gpt_oss_example.py (5)

1-1: Shebang mismatch with file perms.

Either make the file executable or drop the shebang.

-#!/usr/bin/env python3

22-22: Remove unused import.

datetime is unused.

-from datetime import datetime

28-52: Make checkpoint discovery more robust (mtime and dir check).

-def find_most_recent_checkpoint(output_dir):
+def find_most_recent_checkpoint(output_dir):
@@
-    checkpoint_dirs = glob.glob(checkpoint_pattern)
+    checkpoint_dirs = [p for p in glob.glob(checkpoint_pattern) if os.path.isdir(p)]
@@
-    # Find the most recently created checkpoint
-    most_recent_checkpoint = max(checkpoint_dirs, key=os.path.getctime)
+    # Pick most recently modified checkpoint (portable across filesystems)
+    most_recent_checkpoint = max(checkpoint_dirs, key=os.path.getmtime)

118-122: Avoid hardcoding /dev/shm; expose CLI override with safe default.

@@ def main():
-    parser.add_argument('--unmask-messages', action='store_true', default=False,
+    parser.add_argument('--unmask-messages', action='store_true', default=False,
                        help='Unmask messages during training (default: False)')
+    parser.add_argument('--data-output-dir', default=None,
+                        help='Optional processed data dir (default: system temp). Use /dev/shm on Linux for speed.')
@@
-            'data_output_dir': "/dev/shm",      # Use RAM disk for speed
+            'data_output_dir': (args.data_output_dir or os.getenv("TMPDIR") or "/tmp"),

Add import os at top if removed.

159-171: Catching broad Exception; prefer specific exceptions first.

-    except Exception as e:
+    except (ValueError, RuntimeError) as e:
         end_time = time.time()
         duration = end_time - start_time
@@
-        sys.exit(1)
+        sys.exit(1)
+    except Exception as e:
+        print("=" * 50)
+        print(f"Unexpected error: {e}")
+        sys.exit(1)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7065187 and 40a85f3.

📒 Files selected for processing (5)

examples/README.md (2 hunks)
examples/docs/osft_usage.md (3 hunks)
examples/scripts/osft_gpt_oss_example.py (1 hunks)
examples/scripts/sft_gpt_oss_example.py (1 hunks)
pyproject.toml (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

examples/scripts/sft_gpt_oss_example.py (1)

src/training_hub/algorithms/sft.py (1)

sft (177-249)

examples/scripts/osft_gpt_oss_example.py (2)

src/training_hub/algorithms/osft.py (1)

osft (439-500)

examples/scripts/sft_gpt_oss_example.py (1)

main (26-113)

🪛 Ruff (0.12.2)

examples/scripts/sft_gpt_oss_example.py

1-1: Shebang is present but file is not executable

(EXE001)

61-61: Local variable result is assigned to but never used

Remove assignment to unused variable result

(F841)

75-75: Probable insecure usage of temporary file or directory: "/dev/shm"

(S108)

103-103: Do not catch blind exception: Exception

(BLE001)

examples/scripts/osft_gpt_oss_example.py

1-1: Shebang is present but file is not executable

(EXE001)

46-46: Avoid specifying long messages outside the exception class

(TRY003)

119-119: Probable insecure usage of temporary file or directory: "/dev/shm"

(S108)

159-159: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (8)

examples/README.md (3)

27-27: Formatting-only change.

No action needed.

29-30: New SFT GPT‑OSS entry looks good.

Path and description align with the added script.

62-63: New OSFT GPT‑OSS entry looks good.

Consistent with the OSFT example script name.

pyproject.toml (4)

32-32: No action required — PyPI releases exist for these min versions. Verified fsspec (2025.9.0) and regex (2025.9.1) exist on PyPI, so keeping fsspec>=2025.0 and regex>=2025.0 in pyproject.toml is valid.

45-45: Confirm package name: attr vs attrs.

pyproject.toml lists attr>=0.3.2 (pyproject.toml: line 45). Repo search returned no imports of attr/attrs; verify the intended package and update pyproject.toml or the code to match.

21-21: Confirm instructlab-training (>=0.12.0) doesn't require a narrower transformers range.

Wheel metadata shows instructlab-training depends on transformers>=4.41.2, and upstream issues report breakage with transformers >4.51.0 — verify 0.12.0's Requires‑Dist/pyproject so your top-level transformers pin won't cause resolver conflicts. (wheelodex.org)

55-61: Confirm flash‑attn / PyTorch / CUDA compatibility

pyproject.toml extras ('cuda') include torch>=2.6.0 and flash-attn>=2.8 — confirm prebuilt flash-attn wheels exist for the exact PyTorch+CUDA you target (many flash-attn releases on PyPI are source-only and require ninja/CUDA to build; prebuilt wheels are provided separately, e.g. flash-attn-wheels). (pypi.org)

Location: pyproject.toml lines 54–61.

examples/docs/osft_usage.md (1)

90-96: Confirm backend support for osft_memory_efficient_init

Docs/examples add osft_memory_efficient_init, but a repo search shows it only appears in examples/docs and examples/scripts; MiniTrainerOSFTBackend.execute_training’s renames dict (src/training_hub/algorithms/osft.py) does not include this key and there’s no other handling — verify the backend accepts this kwarg (or add a rename/TrainingArgs field and/or validation entry) to avoid runtime "unexpected keyword" or validation errors.

coderabbitai · 2025-09-17T19:15:47Z

pyproject.toml

+    "rhai-innovation-mini-trainer>=0.2.0",
    "torch>=2.6.0",
+    "transformers>=4.55.0",
    "numba>=0.50",


⚠️ Potential issue

🧩 Analysis chain

Numba/llvmlite/numpy compatibility risk.

The trio numba>=0.50, llvmlite>=0.42, and numpy<2.3 is likely to produce resolver or ABI mismatches. Prefer matching Numba’s documented pairings (Numba pins llvmlite tightly and constrains NumPy).

- "numba>=0.50", - "llvmlite>=0.42", - "numpy>=1.26.4,<2.3", + # Align versions to avoid ABI conflicts (NumPy 2+ support) + "numba>=0.60,<0.61", + "llvmlite>=0.43,<0.44", + "numpy>=1.26.4,<2.3",

Please validate against your CI matrix and the GPUs you target.

Also applies to: 38-39

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== locate pyproject.toml ===" fd -HI --hidden pyproject.toml || true for f in $(fd -HI --hidden pyproject.toml || true); do echo "---- $f ----" nl -ba "$f" | sed -n '1,240p' done echo echo "=== Search for numba/llvmlite/numpy occurrences across repo ===" rg -n --hidden -S -C2 '\b(numba|llvmlite|numpy)\b' || true

Length of output: 110

🌐 Web query:

Numba llvmlite compatibility matrix Numba supported NumPy versions site:numba.pydata.org Numba 0.60 release notes llvmlite llvmlite 0.43 release notes

💡 Result:

Here are the sources you requested and the key compatibility facts:

llvmlite v0.43.0 release notes (includes date and highlights). [1] (llvmlite.readthedocs.io)

llvmlite compatibility / historical matrix (README) — shows which LLVM versions each llvmlite series supports. [2] (github.com)

Numba 0.60.0 release notes (highlights) — Numba 0.60.0 released 2024-06-13 and documents NumPy/llvmlite compatibility. See the release notes for details. [3] (numba.readthedocs.io)

Numba 0.60 installation / version-support table (canonical compatibility matrix): Numba 0.60.0 — Python 3.9–<3.13, NumPy 1.22 ≤ version < 1.27 (and == 2.0), llvmlite 0.43.x, LLVM 14.x. [4] (numba.readthedocs.io)

For archived Numba docs showing supported NumPy ranges for earlier Numba releases (examples: 0.48, 0.43, etc.). Use these pages if you need per-release NumPy support details. [5][6] (numba.pydata.org)

If you want, I can extract the full compatibility table (Numba release → supported Python, NumPy, llvmlite, LLVM) from the 0.60 docs into a CSV or formatted table.

Citations:

1: https://llvmlite.readthedocs.io/en/v0.43.0/release-notes.html?utm_source=openai

2: https://github.com/numba/llvmlite/blob/main/README.rst?utm_source=openai

3: https://numba.readthedocs.io/en/latest/release/0.60.0-notes.html?utm_source=openai

4: https://numba.readthedocs.io/en/0.60.0/user/installing.html?utm_source=openai

5: https://numba.pydata.org/numba-doc/0.48.0/user/installing.html?utm_source=openai

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== locate pyproject.toml ===" fd -HI --hidden pyproject.toml || true for f in $(fd -HI --hidden pyproject.toml || true); do echo "---- $f ----" sed -n '1,240p' "$f" || true done echo echo "=== Search for numba/llvmlite/numpy occurrences across repo ===" rg -n --hidden -S -C2 '\b(numba|llvmlite|numpy)\b' || true echo echo "=== Search for CI/workflows that reference numba/llvmlite/numpy or GPUs ===" rg -n --hidden -S --glob '.github/**' 'numba|llvmlite|numpy|cuda|gpu|nvidia|rocm' || true echo echo "=== List .github/workflows files (first 240 lines) ===" fd -HI --hidden .github/workflows || true for f in $(fd -HI --hidden .github/workflows || true); do echo "---- $f ----" sed -n '1,240p' "$f" || true done

Length of output: 3531

Align Numba/llvmlite with supported NumPy to avoid ABI/resolver conflicts.

pyproject.toml currently has "numba>=0.50" (line 21) and "llvmlite>=0.42" (line 38) while NumPy is ">=1.26.4,<2.3". Numba releases tightly pair with specific llvmlite and NumPy ranges (e.g., Numba 0.60 ↔ llvmlite 0.43); loose/mismatched ranges risk resolver or ABI failures. Pin numba/llvmlite to a tested pairing or adjust NumPy bounds and validate on CI + target GPUs.

- "numba>=0.50", - "llvmlite>=0.42", - "numpy>=1.26.4,<2.3", + # Align versions to avoid ABI conflicts (NumPy 2+ support) + "numba>=0.60,<0.61", + "llvmlite>=0.43,<0.44", + "numpy>=1.26.4,<2.3",

pyproject.toml: lines 21, 38.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"numba>=0.50",

# Align versions to avoid ABI conflicts (NumPy 2+ support)

"numba>=0.60,<0.61",

"llvmlite>=0.43,<0.44",

"numpy>=1.26.4,<2.3",

🤖 Prompt for AI Agents

pyproject.toml lines 21 and 38: the numba and llvmlite ranges are too loose compared to the pinned NumPy range and can cause ABI/resolver conflicts; update these pins to a tested pairing (for example pin numba to the release that matches your llvmlite and NumPy matrix, e.g. numba 0.60.x with llvmlite 0.43.x) or alternatively relax/expand the NumPy bounds to match the chosen numba/llvmlite combo; change the version specifiers to explicit, compatible ranges (or exact patch pins) and run CI matrix tests on target Python/NumPy/GPU combos to validate.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

examples/scripts/osft_gpt_oss_example.py (1)
41-51: Checkpoint discovery: prefer mtime or numeric sorting; keep glob fix

Good that the pattern is now samples_* (matches prior review). Using ctime can be unreliable across filesystems; use mtime or parse the numeric suffix.
-    most_recent_checkpoint = max(checkpoint_dirs, key=os.path.getctime)
+    # Prefer modification time; alternatively parse the numeric suffix after 'samples_'
+    most_recent_checkpoint = max(checkpoint_dirs, key=os.path.getmtime)
pyproject.toml (1)
48-48: Do not ship pytest as a runtime dependency

Move pytest>=8.0 to a dev/test extra. Shipping it increases install size and build time for end users.
-    "pytest>=8.0"
+    # moved to optional dev/test dependencies
Then add under [project.optional-dependencies]:
 dev = [
     "ipykernel",
-    "ipython"
+    "ipython",
+    "pytest>=8.0"
 ]

🧹 Nitpick comments (14)

examples/README.md (2)
29-29: Add a short note about CUDA extras next to GPT-OSS 20B script

Readers may miss they need pip install training-hub[cuda] for these scripts.

Apply this minimal addition after the line:
- [SFT with GPT-OSS 20B](scripts/sft_gpt_oss_example.py) - Single-node multi-GPU training example with GPT-OSS 20B
+ [SFT with GPT-OSS 20B](scripts/sft_gpt_oss_example.py) - Single-node multi-GPU training example with GPT-OSS 20B (requires `pip install training-hub[cuda]`)
62-62: Mirror the CUDA extras hint for OSFT script

Same rationale as SFT entry.
- [OSFT with GPT-OSS 20B](scripts/osft_gpt_oss_example.py) - Single-node multi-GPU training example with GPT-OSS 20B
+ [OSFT with GPT-OSS 20B](scripts/osft_gpt_oss_example.py) - Single-node multi-GPU training example with GPT-OSS 20B (requires `pip install training-hub[cuda]`)
examples/docs/osft_usage.md (3)
172-173: Parameter reference: clarify default and compatibility

Add “default: False” and, if applicable, note any backend/version requirements (e.g., requires mini-trainer ≥X.Y).
- - `osft_memory_efficient_init` (bool): Enable memory-efficient initialization to reduce memory usage during model loading (recommended for OOMs)
+ - `osft_memory_efficient_init` (bool, default: False): Enable memory‑efficient initialization to reduce memory usage during model loading (recommended if you hit OOMs). Requires backend support.
271-271: Best Practices: small clarity tweak

Call out explicitly that it affects load-time memory only; training-time memory still governed by max_tokens_per_gpu.
-2. **Memory Management**: OSFT doesn't reduce memory requirements compared to SFT, so adjust `max_tokens_per_gpu` accordingly. For memory-constrained environments or OOMs during model loading, set `osft_memory_efficient_init=True`.
+2. **Memory Management**: OSFT doesn't reduce training‑time memory vs SFT, so adjust `max_tokens_per_gpu` accordingly. For OOMs during model loading specifically, set `osft_memory_efficient_init=True`.
93-95: Document default and scope for osft_memory_efficient_init

osft_memory_efficient_init is referenced only in docs/examples and is not declared on the public osft() API — document that it defaults to False and that it’s forwarded to backends via **kwargs unless you make it an explicit osft(...) parameter.
Occurrences: examples/docs/osft_usage.md (lines 93, 172–173, 271–272); examples/scripts/osft_gpt_oss_example.py (line 125). Public API: src/training_hub/algorithms/osft.py (def osft around lines 434–444) — parameter not present.
examples/scripts/sft_gpt_oss_example.py (4)
1-1: Shebang present but file not executable

Either make the file executable in the repo (chmod +x) or drop the shebang.
-#!/usr/bin/env python3
+# (shebang removed; invoke with `python sft_gpt_oss_example.py ...`)
17-22: Remove unused imports

os and datetime are unused.
-import os
 import sys
 import time
-from datetime import datetime
 import argparse
61-90: Avoid assigning unused result; narrow exception or re-raise with traceback

Drop the unused assignment.

Consider catching expected exception types or re-raising after logging.
-        result = sft(
+        sft(
@@
-    except Exception as e:
+    except Exception as e:  # consider narrowing to expected exceptions
         end_time = time.time()
         duration = end_time - start_time
75-79: Hard-coded /dev/shm can be brittle

Expose as a CLI flag with a safe default (e.g., omit or use tempfile.gettempdir()).
-            data_output_dir="/dev/shm",        # Use RAM disk for speed
+            # Consider exposing via --data-output-dir; default None lets backend decide
+            # data_output_dir="/dev/shm",
examples/scripts/osft_gpt_oss_example.py (4)
1-1: Shebang present but file not executable

Same as SFT script: make executable or remove shebang.
-#!/usr/bin/env python3
+# (shebang removed; invoke with `python osft_gpt_oss_example.py ...`)
19-25: Remove unused import

datetime is unused.
-from datetime import datetime
 import argparse
 import glob
119-121: Hard-coded /dev/shm can be brittle

Same suggestion as SFT script: make it configurable or omit.
-            'data_output_dir': "/dev/shm",      # Use RAM disk for speed
+            # 'data_output_dir': "/dev/shm",      # Consider exposing via CLI; default None is safer
159-171: Catching broad Exception

Consider narrowing to expected exceptions and/or printing a traceback before exiting.
-    except Exception as e:
+    except Exception as e:  # consider narrowing to expected exceptions
         end_time = time.time()
         duration = end_time - start_time
pyproject.toml (1)
40-47: Duplicate dependency: filelock listed twice

Keep the stricter one (>=3.19.1) and drop the other to avoid confusion.
-    "filelock>=3.0",
@@
-    "filelock>=3.19.1",
+    "filelock>=3.19.1",

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7065187 and 1f158f8.

📒 Files selected for processing (6)

README.md (1 hunks)
examples/README.md (2 hunks)
examples/docs/osft_usage.md (3 hunks)
examples/scripts/osft_gpt_oss_example.py (1 hunks)
examples/scripts/sft_gpt_oss_example.py (1 hunks)
pyproject.toml (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

examples/scripts/osft_gpt_oss_example.py (2)

src/training_hub/algorithms/osft.py (1)

osft (439-500)

examples/scripts/sft_gpt_oss_example.py (1)

main (26-113)

examples/scripts/sft_gpt_oss_example.py (2)

src/training_hub/algorithms/sft.py (1)

sft (177-249)

examples/scripts/osft_gpt_oss_example.py (1)

main (54-171)

🪛 Ruff (0.12.2)

examples/scripts/osft_gpt_oss_example.py

1-1: Shebang is present but file is not executable

(EXE001)

46-46: Avoid specifying long messages outside the exception class

(TRY003)

119-119: Probable insecure usage of temporary file or directory: "/dev/shm"

(S108)

159-159: Do not catch blind exception: Exception

(BLE001)

examples/scripts/sft_gpt_oss_example.py

1-1: Shebang is present but file is not executable

(EXE001)

61-61: Local variable result is assigned to but never used

Remove assignment to unused variable result

(F841)

75-75: Probable insecure usage of temporary file or directory: "/dev/shm"

(S108)

103-103: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (7)

README.md (1)

9-9: Support Matrix: OSFT marked Implemented — looks good

Status change aligns with the new examples and docs. No further action from this file.

examples/README.md (1)

27-27: Formatting-only tweak — OK

No behavioral impact.

examples/scripts/sft_gpt_oss_example.py (1)

80-82: Checkpointing choice: confirm behavior matches backend

If accelerate_full_state_at_epoch=False, resume behavior changes. Confirm that’s intended for users copying this example.

examples/scripts/osft_gpt_oss_example.py (1)

63-78: Confirm HF model id

Ensure --model-path default openai/gpt-oss-20b matches the actual HF repo name.

pyproject.toml (3)

54-61: Extras naming: confirm docs align with cuda extra

The README installs training-hub[cuda], which matches this extra. If any docs mention a gpt-oss extra, update them for consistency.

45-46: Verify attr vs attrs in pyproject.toml

pyproject.toml lists "attr>=0.3.2" (pyproject.toml: lines 45–46). The canonical package is usually "attrs" — confirm which package you intend to require and update pyproject.toml if necessary. Repository search returned no import matches for either; manual verification required.

17-23: Version alignment sanity check for core training stack

Large version bumps in pyproject.toml (instructlab-training, rhai-innovation-mini-trainer, transformers, datasets, numpy upper bound) can conflict with torch/flash-attn/bitsandbytes — ensure CI uses a pinned lockfile or constraints file and verify resolver stability across Linux/macOS/GPU images.

Automated search returned no output; run locally/CI to confirm:

rg -n -C2 -S 'instructlab-training|rhai-innovation-mini-trainer|transformers|torch|flash-attn|bitsandbytes|numpy|accelerate|peft' pyproject.toml

ls | rg 'poetry.lock|requirements.txt|constraints.txt|Pipfile.lock' || true

python -m pip install -U pip && python -m pip install 'training-hub[cuda] @ .'

Maxusmusti added 3 commits September 17, 2025 13:24

Update dependencies to natively support gpt-oss

455a61f

Signed-off-by: Mustafa Eyceoz <[email protected]>

Add example scripts for GPT-OSS

9b8505b

Signed-off-by: Mustafa Eyceoz <[email protected]>

Add docs for memory efficient init for OSFT

40a85f3

Signed-off-by: Mustafa Eyceoz <[email protected]>

Maxusmusti self-assigned this Sep 17, 2025

Update README for OSFT

58dadcf

Signed-off-by: Mustafa Eyceoz <[email protected]>

RobotSail approved these changes Sep 17, 2025

View reviewed changes

examples/scripts/osft_gpt_oss_example.py Outdated Show resolved Hide resolved

examples/scripts/osft_gpt_oss_example.py Show resolved Hide resolved

pyproject.toml Show resolved Hide resolved

Change checkpoint naming check in example script

1f158f8

Co-authored-by: Oleg Silkin <[email protected]>

coderabbitai bot reviewed Sep 17, 2025

View reviewed changes

Maxusmusti merged commit 8164824 into main Sep 17, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update dependencies, examples, and docs for GPT-OSS #6

Update dependencies, examples, and docs for GPT-OSS #6

Uh oh!

Maxusmusti commented Sep 17, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 17, 2025 •

edited

Loading

Uh oh!

RobotSail left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 17, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-    "numba>=0.50",
+    # Align versions to avoid ABI conflicts (NumPy 2+ support)
+    "numba>=0.60,<0.61",
+    "llvmlite>=0.43,<0.44",
+    "numpy>=1.26.4,<2.3",

Update dependencies, examples, and docs for GPT-OSS #6

Update dependencies, examples, and docs for GPT-OSS #6

Uh oh!

Conversation

Maxusmusti commented Sep 17, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

RobotSail left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Maxusmusti commented Sep 17, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 17, 2025 •

edited

Loading