Skip to content

Conversation

@Maxusmusti
Copy link
Contributor

@Maxusmusti Maxusmusti commented Sep 17, 2025

Adds two new example scripts, updated documentation, and reorganized dependencies for GPT-OSS support

Summary by CodeRabbit

  • New Features

    • Added memory-efficient initialization option for OSFT.
    • Introduced example scripts showcasing OSFT and SFT training for GPT-OSS 20B.
  • Documentation

    • Updated Support Matrix to mark Continual Learning (OSFT) as Implemented.
    • Expanded OSFT usage guide with the new memory-efficient option, examples, and best practices.
    • Updated examples overview to include new GPT-OSS 20B scripts.
  • Chores

    • Upgraded core dependencies and added Transformers; widened NumPy compatibility.
    • Reorganized optional dependencies, introducing a new GPU-focused group.
    • Added pytest to dependencies.

@Maxusmusti Maxusmusti self-assigned this Sep 17, 2025
@coderabbitai
Copy link

coderabbitai bot commented Sep 17, 2025

Walkthrough

Updates documentation to mark OSFT as implemented, adds two GPT-OSS 20B example scripts (SFT and OSFT) with CLI and checkpoint handling, documents a new OSFT parameter (osft_memory_efficient_init), and revises dependencies in pyproject.toml including new/updated packages and optional group changes.

Changes

Cohort / File(s) Summary of Changes
Support Matrix
README.md
Marked Continual Learning (OSFT) as Implemented in the support matrix; no other matrix changes.
Examples Index
examples/README.md
Added entries linking to new GPT-OSS 20B SFT and OSFT scripts; minor formatting tweak around existing SFT line.
OSFT Docs & Params
examples/docs/osft_usage.md
Documented new boolean parameter osft_memory_efficient_init; updated usage example, parameter reference, and best practices.
OSFT GPT-OSS 20B Script
examples/scripts/osft_gpt_oss_example.py
New script demonstrating OSFT for GPT-OSS 20B: CLI, training parameter assembly, distributed settings, runtime timing, error handling, and find_most_recent_checkpoint utility.
SFT GPT-OSS 20B Script
examples/scripts/sft_gpt_oss_example.py
New script demonstrating SFT for GPT-OSS 20B: CLI, predefined hyperparameters, distributed config, timing, and reporting.
Dependencies & Optionals
pyproject.toml
Upgraded/core deps: instructlab-training>=0.12.0, rhai-innovation-mini-trainer>=0.2.0, transformers>=4.55.0, datasets>=4.0.0, numpy<2.3; added pytest>=8.0. Removed optional cuda group; reworked gpt-oss optional group with CUDA-enabled packages and related libs.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor U as User
  participant CLI as CLI Args
  participant Script as SFT/OSFT Script
  participant Hub as training_hub.sft/osft
  participant Trainer as Trainer/Backend
  participant Store as Checkpoint Storage

  U->>CLI: Provide --data-path, --ckpt-output-dir, ...
  CLI->>Script: Parse args
  Script->>Script: Build params (model, batch, dist, OSFT/SFT)
  Script->>Hub: Invoke sft()/osft()(params)
  Hub->>Trainer: Initialize & run training
  Trainer-->>Store: Write checkpoints
  Script->>Script: find_most_recent_checkpoint(output_dir)
  Store-->>Script: Latest checkpoint path
  Script-->>U: Print duration and checkpoint

  rect rgba(220,255,220,0.3)
    Note right of Script: New: OSFT example script and checkpoint finder
    Note right of Hub: New doc param osft_memory_efficient_init (OSFT)
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I tuned my ears to OSS winds tonight,
With SFT carrots stacked just right. 🥕
OSFT trails through checkpoints gleam,
I hop through epochs, swift as a beam.
New deps burrow snug and tight—
Ship it, nibble, train in delight!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title concisely and accurately summarizes the primary changes in the changeset—dependency updates (pyproject.toml), new/updated example scripts, and documentation edits targeting GPT-OSS—so it conveys the main intent to a reviewer scanning history. It is specific to GPT-OSS and not overly vague or noisy.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch update-gpt-oss

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Mustafa Eyceoz <[email protected]>
Copy link
Collaborator

@RobotSail RobotSail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments but otherwise LGTM

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (15)
examples/docs/osft_usage.md (2)

145-173: Deduplicate/relocate parameter docs for clarity.

unfreeze_rank_ratio appears under Required Parameters and again under OSFT‑Specific Parameters. Keep it only under Required and move the range guidance there to avoid conflicting edits later.


269-273: Tighten the memory guidance wording.

Clarify that memory‑efficient init helps at load time, not step time. Suggested tweak:

-2. **Memory Management**: OSFT doesn't reduce memory requirements compared to SFT, so adjust `max_tokens_per_gpu` accordingly. For memory-constrained environments or OOMs during model loading, set `osft_memory_efficient_init=True`.
+2. **Memory Management**: OSFT does not reduce step‑time memory vs SFT. Tune `max_tokens_per_gpu` to avoid OOMs during training. For OOMs while loading the base model, enable `osft_memory_efficient_init=True` (affects load‑time memory only).
pyproject.toml (3)

39-46: Duplicate dependency: filelock listed twice.

Drop the duplicate to avoid resolver noise.

     "filelock>=3.0",
     "psutil>=6.0",
     "urllib3>=2.4",
     "frozenlist>=1.7",
     "xxhash>=3.0",
     "requests>=2.32.5",
     "attr>=0.3.2",
-    "filelock>=3.19.1",
     "mpmath>=1.3.0",
     "pytest>=8.0"

47-49: Move pytest to dev extras (avoid shipping test runner to end users).

-    "mpmath>=1.3.0",
-    "pytest>=8.0"
+    "mpmath>=1.3.0"
 ]
@@
 dev = [
     "ipykernel",
-    "ipython"
+    "ipython",
+    "pytest>=8.0"
 ]

14-17: Build tools in runtime deps.

wheel (and usually setuptools) shouldn’t be a runtime requirement given they’re already in [build-system]. Keep packaging if used at runtime.

 dependencies = [
-    "setuptools>=80.0",
     "packaging>=24.2",
-    "wheel>=0.43",
+    # wheel/setuptools are provided via [build-system]
examples/scripts/sft_gpt_oss_example.py (5)

1-1: Shebang mismatch with file perms.

Either make the file executable or drop the shebang to appease linters.

-#!/usr/bin/env python3

17-22: Remove unused imports.

os and datetime aren’t used.

-import os
 import sys
 import time
-from datetime import datetime
 import argparse

61-61: Drop unused result assignment.

-        result = sft(
+        sft(

74-79: Avoid hardcoding /dev/shm; make it configurable with a safe default.

@@
-    parser.add_argument('--nproc-per-node', type=int, default=8,
+    parser.add_argument('--nproc-per-node', type=int, default=8,
                        help='Number of GPUs (default: 8)')
+    parser.add_argument('--data-output-dir', default=None,
+                       help='Optional processed data dir (default: system temp). Use /dev/shm on Linux for speed.')
@@
-            data_output_dir="/dev/shm",        # Use RAM disk for speed
+            data_output_dir=(args.data_output_dir or os.getenv("TMPDIR") or "/tmp"),

Add import os back if you apply this change.


103-114: Catching broad Exception; prefer specific exceptions first.

-    except Exception as e:
+    except (ValueError, RuntimeError) as e:
         end_time = time.time()
         duration = end_time - start_time
@@
-        sys.exit(1)
+        sys.exit(1)
+    except Exception as e:
+        # Fallback catch-all to surface unexpected errors cleanly
+        print("=" * 50)
+        print(f"Unexpected error: {e}")
+        sys.exit(1)
examples/scripts/osft_gpt_oss_example.py (5)

1-1: Shebang mismatch with file perms.

Either make the file executable or drop the shebang.

-#!/usr/bin/env python3

22-22: Remove unused import.

datetime is unused.

-from datetime import datetime

28-52: Make checkpoint discovery more robust (mtime and dir check).

-def find_most_recent_checkpoint(output_dir):
+def find_most_recent_checkpoint(output_dir):
@@
-    checkpoint_dirs = glob.glob(checkpoint_pattern)
+    checkpoint_dirs = [p for p in glob.glob(checkpoint_pattern) if os.path.isdir(p)]
@@
-    # Find the most recently created checkpoint
-    most_recent_checkpoint = max(checkpoint_dirs, key=os.path.getctime)
+    # Pick most recently modified checkpoint (portable across filesystems)
+    most_recent_checkpoint = max(checkpoint_dirs, key=os.path.getmtime)

118-122: Avoid hardcoding /dev/shm; expose CLI override with safe default.

@@ def main():
-    parser.add_argument('--unmask-messages', action='store_true', default=False,
+    parser.add_argument('--unmask-messages', action='store_true', default=False,
                        help='Unmask messages during training (default: False)')
+    parser.add_argument('--data-output-dir', default=None,
+                        help='Optional processed data dir (default: system temp). Use /dev/shm on Linux for speed.')
@@
-            'data_output_dir': "/dev/shm",      # Use RAM disk for speed
+            'data_output_dir': (args.data_output_dir or os.getenv("TMPDIR") or "/tmp"),

Add import os at top if removed.


159-171: Catching broad Exception; prefer specific exceptions first.

-    except Exception as e:
+    except (ValueError, RuntimeError) as e:
         end_time = time.time()
         duration = end_time - start_time
@@
-        sys.exit(1)
+        sys.exit(1)
+    except Exception as e:
+        print("=" * 50)
+        print(f"Unexpected error: {e}")
+        sys.exit(1)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7065187 and 40a85f3.

📒 Files selected for processing (5)
  • examples/README.md (2 hunks)
  • examples/docs/osft_usage.md (3 hunks)
  • examples/scripts/osft_gpt_oss_example.py (1 hunks)
  • examples/scripts/sft_gpt_oss_example.py (1 hunks)
  • pyproject.toml (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
examples/scripts/sft_gpt_oss_example.py (1)
src/training_hub/algorithms/sft.py (1)
  • sft (177-249)
examples/scripts/osft_gpt_oss_example.py (2)
src/training_hub/algorithms/osft.py (1)
  • osft (439-500)
examples/scripts/sft_gpt_oss_example.py (1)
  • main (26-113)
🪛 Ruff (0.12.2)
examples/scripts/sft_gpt_oss_example.py

1-1: Shebang is present but file is not executable

(EXE001)


61-61: Local variable result is assigned to but never used

Remove assignment to unused variable result

(F841)


75-75: Probable insecure usage of temporary file or directory: "/dev/shm"

(S108)


103-103: Do not catch blind exception: Exception

(BLE001)

examples/scripts/osft_gpt_oss_example.py

1-1: Shebang is present but file is not executable

(EXE001)


46-46: Avoid specifying long messages outside the exception class

(TRY003)


119-119: Probable insecure usage of temporary file or directory: "/dev/shm"

(S108)


159-159: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (8)
examples/README.md (3)

27-27: Formatting-only change.

No action needed.


29-30: New SFT GPT‑OSS entry looks good.

Path and description align with the added script.


62-63: New OSFT GPT‑OSS entry looks good.

Consistent with the OSFT example script name.

pyproject.toml (4)

32-32: No action required — PyPI releases exist for these min versions. Verified fsspec (2025.9.0) and regex (2025.9.1) exist on PyPI, so keeping fsspec>=2025.0 and regex>=2025.0 in pyproject.toml is valid.


45-45: Confirm package name: attr vs attrs.

pyproject.toml lists attr>=0.3.2 (pyproject.toml: line 45). Repo search returned no imports of attr/attrs; verify the intended package and update pyproject.toml or the code to match.


21-21: Confirm instructlab-training (>=0.12.0) doesn't require a narrower transformers range.

Wheel metadata shows instructlab-training depends on transformers>=4.41.2, and upstream issues report breakage with transformers >4.51.0 — verify 0.12.0's Requires‑Dist/pyproject so your top-level transformers pin won't cause resolver conflicts. (wheelodex.org)


55-61: Confirm flash‑attn / PyTorch / CUDA compatibility

pyproject.toml extras ('cuda') include torch>=2.6.0 and flash-attn>=2.8 — confirm prebuilt flash-attn wheels exist for the exact PyTorch+CUDA you target (many flash-attn releases on PyPI are source-only and require ninja/CUDA to build; prebuilt wheels are provided separately, e.g. flash-attn-wheels). (pypi.org)

Location: pyproject.toml lines 54–61.

examples/docs/osft_usage.md (1)

90-96: Confirm backend support for osft_memory_efficient_init

Docs/examples add osft_memory_efficient_init, but a repo search shows it only appears in examples/docs and examples/scripts; MiniTrainerOSFTBackend.execute_training’s renames dict (src/training_hub/algorithms/osft.py) does not include this key and there’s no other handling — verify the backend accepts this kwarg (or add a rename/TrainingArgs field and/or validation entry) to avoid runtime "unexpected keyword" or validation errors.

"rhai-innovation-mini-trainer>=0.2.0",
"torch>=2.6.0",
"transformers>=4.55.0",
"numba>=0.50",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

Numba/llvmlite/numpy compatibility risk.

The trio numba>=0.50, llvmlite>=0.42, and numpy<2.3 is likely to produce resolver or ABI mismatches. Prefer matching Numba’s documented pairings (Numba pins llvmlite tightly and constrains NumPy).

-    "numba>=0.50",
-    "llvmlite>=0.42",
-    "numpy>=1.26.4,<2.3",
+    # Align versions to avoid ABI conflicts (NumPy 2+ support)
+    "numba>=0.60,<0.61",
+    "llvmlite>=0.43,<0.44",
+    "numpy>=1.26.4,<2.3",

Please validate against your CI matrix and the GPUs you target.

Also applies to: 38-39


🏁 Script executed:

#!/bin/bash
set -euo pipefail
echo "=== locate pyproject.toml ==="
fd -HI --hidden pyproject.toml || true
for f in $(fd -HI --hidden pyproject.toml || true); do
  echo "---- $f ----"
  nl -ba "$f" | sed -n '1,240p'
done
echo
echo "=== Search for numba/llvmlite/numpy occurrences across repo ==="
rg -n --hidden -S -C2 '\b(numba|llvmlite|numpy)\b' || true

Length of output: 110


🌐 Web query:

Numba llvmlite compatibility matrix
Numba supported NumPy versions site:numba.pydata.org
Numba 0.60 release notes llvmlite
llvmlite 0.43 release notes

💡 Result:

Here are the sources you requested and the key compatibility facts:

  • llvmlite v0.43.0 release notes (includes date and highlights). [1] (llvmlite.readthedocs.io)
  • llvmlite compatibility / historical matrix (README) — shows which LLVM versions each llvmlite series supports. [2] (github.com)
  • Numba 0.60.0 release notes (highlights) — Numba 0.60.0 released 2024-06-13 and documents NumPy/llvmlite compatibility. See the release notes for details. [3] (numba.readthedocs.io)
  • Numba 0.60 installation / version-support table (canonical compatibility matrix): Numba 0.60.0 — Python 3.9–<3.13, NumPy 1.22 ≤ version < 1.27 (and == 2.0), llvmlite 0.43.x, LLVM 14.x. [4] (numba.readthedocs.io)
  • For archived Numba docs showing supported NumPy ranges for earlier Numba releases (examples: 0.48, 0.43, etc.). Use these pages if you need per-release NumPy support details. [5][6] (numba.pydata.org)

If you want, I can extract the full compatibility table (Numba release → supported Python, NumPy, llvmlite, LLVM) from the 0.60 docs into a CSV or formatted table.

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail
echo "=== locate pyproject.toml ==="
fd -HI --hidden pyproject.toml || true
for f in $(fd -HI --hidden pyproject.toml || true); do
  echo "---- $f ----"
  sed -n '1,240p' "$f" || true
done

echo
echo "=== Search for numba/llvmlite/numpy occurrences across repo ==="
rg -n --hidden -S -C2 '\b(numba|llvmlite|numpy)\b' || true

echo
echo "=== Search for CI/workflows that reference numba/llvmlite/numpy or GPUs ==="
rg -n --hidden -S --glob '.github/**' 'numba|llvmlite|numpy|cuda|gpu|nvidia|rocm' || true

echo
echo "=== List .github/workflows files (first 240 lines) ==="
fd -HI --hidden .github/workflows || true
for f in $(fd -HI --hidden .github/workflows || true); do
  echo "---- $f ----"
  sed -n '1,240p' "$f" || true
done

Length of output: 3531


Align Numba/llvmlite with supported NumPy to avoid ABI/resolver conflicts.

pyproject.toml currently has "numba>=0.50" (line 21) and "llvmlite>=0.42" (line 38) while NumPy is ">=1.26.4,<2.3". Numba releases tightly pair with specific llvmlite and NumPy ranges (e.g., Numba 0.60 ↔ llvmlite 0.43); loose/mismatched ranges risk resolver or ABI failures. Pin numba/llvmlite to a tested pairing or adjust NumPy bounds and validate on CI + target GPUs.

-    "numba>=0.50",
-    "llvmlite>=0.42",
-    "numpy>=1.26.4,<2.3",
+    # Align versions to avoid ABI conflicts (NumPy 2+ support)
+    "numba>=0.60,<0.61",
+    "llvmlite>=0.43,<0.44",
+    "numpy>=1.26.4,<2.3",

pyproject.toml: lines 21, 38.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"numba>=0.50",
# Align versions to avoid ABI conflicts (NumPy 2+ support)
"numba>=0.60,<0.61",
"llvmlite>=0.43,<0.44",
"numpy>=1.26.4,<2.3",
🤖 Prompt for AI Agents
pyproject.toml lines 21 and 38: the numba and llvmlite ranges are too loose
compared to the pinned NumPy range and can cause ABI/resolver conflicts; update
these pins to a tested pairing (for example pin numba to the release that
matches your llvmlite and NumPy matrix, e.g. numba 0.60.x with llvmlite 0.43.x)
or alternatively relax/expand the NumPy bounds to match the chosen
numba/llvmlite combo; change the version specifiers to explicit, compatible
ranges (or exact patch pins) and run CI matrix tests on target Python/NumPy/GPU
combos to validate.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
examples/scripts/osft_gpt_oss_example.py (1)

41-51: Checkpoint discovery: prefer mtime or numeric sorting; keep glob fix

Good that the pattern is now samples_* (matches prior review). Using ctime can be unreliable across filesystems; use mtime or parse the numeric suffix.

-    most_recent_checkpoint = max(checkpoint_dirs, key=os.path.getctime)
+    # Prefer modification time; alternatively parse the numeric suffix after 'samples_'
+    most_recent_checkpoint = max(checkpoint_dirs, key=os.path.getmtime)
pyproject.toml (1)

48-48: Do not ship pytest as a runtime dependency

Move pytest>=8.0 to a dev/test extra. Shipping it increases install size and build time for end users.

-    "pytest>=8.0"
+    # moved to optional dev/test dependencies

Then add under [project.optional-dependencies]:

 dev = [
     "ipykernel",
-    "ipython"
+    "ipython",
+    "pytest>=8.0"
 ]
🧹 Nitpick comments (14)
examples/README.md (2)

29-29: Add a short note about CUDA extras next to GPT-OSS 20B script

Readers may miss they need pip install training-hub[cuda] for these scripts.

Apply this minimal addition after the line:

- [SFT with GPT-OSS 20B](scripts/sft_gpt_oss_example.py) - Single-node multi-GPU training example with GPT-OSS 20B
+ [SFT with GPT-OSS 20B](scripts/sft_gpt_oss_example.py) - Single-node multi-GPU training example with GPT-OSS 20B (requires `pip install training-hub[cuda]`)

62-62: Mirror the CUDA extras hint for OSFT script

Same rationale as SFT entry.

- [OSFT with GPT-OSS 20B](scripts/osft_gpt_oss_example.py) - Single-node multi-GPU training example with GPT-OSS 20B
+ [OSFT with GPT-OSS 20B](scripts/osft_gpt_oss_example.py) - Single-node multi-GPU training example with GPT-OSS 20B (requires `pip install training-hub[cuda]`)
examples/docs/osft_usage.md (3)

172-173: Parameter reference: clarify default and compatibility

Add “default: False” and, if applicable, note any backend/version requirements (e.g., requires mini-trainer ≥X.Y).

- - `osft_memory_efficient_init` (bool): Enable memory-efficient initialization to reduce memory usage during model loading (recommended for OOMs)
+ - `osft_memory_efficient_init` (bool, default: False): Enable memory‑efficient initialization to reduce memory usage during model loading (recommended if you hit OOMs). Requires backend support.

271-271: Best Practices: small clarity tweak

Call out explicitly that it affects load-time memory only; training-time memory still governed by max_tokens_per_gpu.

-2. **Memory Management**: OSFT doesn't reduce memory requirements compared to SFT, so adjust `max_tokens_per_gpu` accordingly. For memory-constrained environments or OOMs during model loading, set `osft_memory_efficient_init=True`.
+2. **Memory Management**: OSFT doesn't reduce training‑time memory vs SFT, so adjust `max_tokens_per_gpu` accordingly. For OOMs during model loading specifically, set `osft_memory_efficient_init=True`.

93-95: Document default and scope for osft_memory_efficient_init

osft_memory_efficient_init is referenced only in docs/examples and is not declared on the public osft() API — document that it defaults to False and that it’s forwarded to backends via **kwargs unless you make it an explicit osft(...) parameter.
Occurrences: examples/docs/osft_usage.md (lines 93, 172–173, 271–272); examples/scripts/osft_gpt_oss_example.py (line 125). Public API: src/training_hub/algorithms/osft.py (def osft around lines 434–444) — parameter not present.

examples/scripts/sft_gpt_oss_example.py (4)

1-1: Shebang present but file not executable

Either make the file executable in the repo (chmod +x) or drop the shebang.

-#!/usr/bin/env python3
+# (shebang removed; invoke with `python sft_gpt_oss_example.py ...`)

17-22: Remove unused imports

os and datetime are unused.

-import os
 import sys
 import time
-from datetime import datetime
 import argparse

61-90: Avoid assigning unused result; narrow exception or re-raise with traceback

  • Drop the unused assignment.
  • Consider catching expected exception types or re-raising after logging.
-        result = sft(
+        sft(
@@
-    except Exception as e:
+    except Exception as e:  # consider narrowing to expected exceptions
         end_time = time.time()
         duration = end_time - start_time

75-79: Hard-coded /dev/shm can be brittle

Expose as a CLI flag with a safe default (e.g., omit or use tempfile.gettempdir()).

-            data_output_dir="/dev/shm",        # Use RAM disk for speed
+            # Consider exposing via --data-output-dir; default None lets backend decide
+            # data_output_dir="/dev/shm",
examples/scripts/osft_gpt_oss_example.py (4)

1-1: Shebang present but file not executable

Same as SFT script: make executable or remove shebang.

-#!/usr/bin/env python3
+# (shebang removed; invoke with `python osft_gpt_oss_example.py ...`)

19-25: Remove unused import

datetime is unused.

-from datetime import datetime
 import argparse
 import glob

119-121: Hard-coded /dev/shm can be brittle

Same suggestion as SFT script: make it configurable or omit.

-            'data_output_dir': "/dev/shm",      # Use RAM disk for speed
+            # 'data_output_dir': "/dev/shm",      # Consider exposing via CLI; default None is safer

159-171: Catching broad Exception

Consider narrowing to expected exceptions and/or printing a traceback before exiting.

-    except Exception as e:
+    except Exception as e:  # consider narrowing to expected exceptions
         end_time = time.time()
         duration = end_time - start_time
pyproject.toml (1)

40-47: Duplicate dependency: filelock listed twice

Keep the stricter one (>=3.19.1) and drop the other to avoid confusion.

-    "filelock>=3.0",
@@
-    "filelock>=3.19.1",
+    "filelock>=3.19.1",
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7065187 and 1f158f8.

📒 Files selected for processing (6)
  • README.md (1 hunks)
  • examples/README.md (2 hunks)
  • examples/docs/osft_usage.md (3 hunks)
  • examples/scripts/osft_gpt_oss_example.py (1 hunks)
  • examples/scripts/sft_gpt_oss_example.py (1 hunks)
  • pyproject.toml (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
examples/scripts/osft_gpt_oss_example.py (2)
src/training_hub/algorithms/osft.py (1)
  • osft (439-500)
examples/scripts/sft_gpt_oss_example.py (1)
  • main (26-113)
examples/scripts/sft_gpt_oss_example.py (2)
src/training_hub/algorithms/sft.py (1)
  • sft (177-249)
examples/scripts/osft_gpt_oss_example.py (1)
  • main (54-171)
🪛 Ruff (0.12.2)
examples/scripts/osft_gpt_oss_example.py

1-1: Shebang is present but file is not executable

(EXE001)


46-46: Avoid specifying long messages outside the exception class

(TRY003)


119-119: Probable insecure usage of temporary file or directory: "/dev/shm"

(S108)


159-159: Do not catch blind exception: Exception

(BLE001)

examples/scripts/sft_gpt_oss_example.py

1-1: Shebang is present but file is not executable

(EXE001)


61-61: Local variable result is assigned to but never used

Remove assignment to unused variable result

(F841)


75-75: Probable insecure usage of temporary file or directory: "/dev/shm"

(S108)


103-103: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (7)
README.md (1)

9-9: Support Matrix: OSFT marked Implemented — looks good

Status change aligns with the new examples and docs. No further action from this file.

examples/README.md (1)

27-27: Formatting-only tweak — OK

No behavioral impact.

examples/scripts/sft_gpt_oss_example.py (1)

80-82: Checkpointing choice: confirm behavior matches backend

If accelerate_full_state_at_epoch=False, resume behavior changes. Confirm that’s intended for users copying this example.

examples/scripts/osft_gpt_oss_example.py (1)

63-78: Confirm HF model id

Ensure --model-path default openai/gpt-oss-20b matches the actual HF repo name.

pyproject.toml (3)

54-61: Extras naming: confirm docs align with cuda extra

The README installs training-hub[cuda], which matches this extra. If any docs mention a gpt-oss extra, update them for consistency.


45-46: Verify attr vs attrs in pyproject.toml

pyproject.toml lists "attr>=0.3.2" (pyproject.toml: lines 45–46). The canonical package is usually "attrs" — confirm which package you intend to require and update pyproject.toml if necessary. Repository search returned no import matches for either; manual verification required.


17-23: Version alignment sanity check for core training stack

Large version bumps in pyproject.toml (instructlab-training, rhai-innovation-mini-trainer, transformers, datasets, numpy upper bound) can conflict with torch/flash-attn/bitsandbytes — ensure CI uses a pinned lockfile or constraints file and verify resolver stability across Linux/macOS/GPU images.

Automated search returned no output; run locally/CI to confirm:

  • rg -n -C2 -S 'instructlab-training|rhai-innovation-mini-trainer|transformers|torch|flash-attn|bitsandbytes|numpy|accelerate|peft' pyproject.toml
  • ls | rg 'poetry.lock|requirements.txt|constraints.txt|Pipfile.lock' || true
  • python -m pip install -U pip && python -m pip install 'training-hub[cuda] @ .'

@Maxusmusti Maxusmusti merged commit 8164824 into main Sep 17, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants