-
Notifications
You must be signed in to change notification settings - Fork 18
Update dependencies, examples, and docs for GPT-OSS #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Mustafa Eyceoz <[email protected]>
Signed-off-by: Mustafa Eyceoz <[email protected]>
Signed-off-by: Mustafa Eyceoz <[email protected]>
WalkthroughUpdates documentation to mark OSFT as implemented, adds two GPT-OSS 20B example scripts (SFT and OSFT) with CLI and checkpoint handling, documents a new OSFT parameter (osft_memory_efficient_init), and revises dependencies in pyproject.toml including new/updated packages and optional group changes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor U as User
participant CLI as CLI Args
participant Script as SFT/OSFT Script
participant Hub as training_hub.sft/osft
participant Trainer as Trainer/Backend
participant Store as Checkpoint Storage
U->>CLI: Provide --data-path, --ckpt-output-dir, ...
CLI->>Script: Parse args
Script->>Script: Build params (model, batch, dist, OSFT/SFT)
Script->>Hub: Invoke sft()/osft()(params)
Hub->>Trainer: Initialize & run training
Trainer-->>Store: Write checkpoints
Script->>Script: find_most_recent_checkpoint(output_dir)
Store-->>Script: Latest checkpoint path
Script-->>U: Print duration and checkpoint
rect rgba(220,255,220,0.3)
Note right of Script: New: OSFT example script and checkpoint finder
Note right of Hub: New doc param osft_memory_efficient_init (OSFT)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Mustafa Eyceoz <[email protected]>
RobotSail
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments but otherwise LGTM
Co-authored-by: Oleg Silkin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (15)
examples/docs/osft_usage.md (2)
145-173: Deduplicate/relocate parameter docs for clarity.
unfreeze_rank_ratioappears under Required Parameters and again under OSFT‑Specific Parameters. Keep it only under Required and move the range guidance there to avoid conflicting edits later.
269-273: Tighten the memory guidance wording.Clarify that memory‑efficient init helps at load time, not step time. Suggested tweak:
-2. **Memory Management**: OSFT doesn't reduce memory requirements compared to SFT, so adjust `max_tokens_per_gpu` accordingly. For memory-constrained environments or OOMs during model loading, set `osft_memory_efficient_init=True`. +2. **Memory Management**: OSFT does not reduce step‑time memory vs SFT. Tune `max_tokens_per_gpu` to avoid OOMs during training. For OOMs while loading the base model, enable `osft_memory_efficient_init=True` (affects load‑time memory only).pyproject.toml (3)
39-46: Duplicate dependency: filelock listed twice.Drop the duplicate to avoid resolver noise.
"filelock>=3.0", "psutil>=6.0", "urllib3>=2.4", "frozenlist>=1.7", "xxhash>=3.0", "requests>=2.32.5", "attr>=0.3.2", - "filelock>=3.19.1", "mpmath>=1.3.0", "pytest>=8.0"
47-49: Move pytest to dev extras (avoid shipping test runner to end users).- "mpmath>=1.3.0", - "pytest>=8.0" + "mpmath>=1.3.0" ] @@ dev = [ "ipykernel", - "ipython" + "ipython", + "pytest>=8.0" ]
14-17: Build tools in runtime deps.
wheel(and usuallysetuptools) shouldn’t be a runtime requirement given they’re already in [build-system]. Keeppackagingif used at runtime.dependencies = [ - "setuptools>=80.0", "packaging>=24.2", - "wheel>=0.43", + # wheel/setuptools are provided via [build-system]examples/scripts/sft_gpt_oss_example.py (5)
1-1: Shebang mismatch with file perms.Either make the file executable or drop the shebang to appease linters.
-#!/usr/bin/env python3
17-22: Remove unused imports.
osanddatetimearen’t used.-import os import sys import time -from datetime import datetime import argparse
61-61: Drop unusedresultassignment.- result = sft( + sft(
74-79: Avoid hardcoding/dev/shm; make it configurable with a safe default.@@ - parser.add_argument('--nproc-per-node', type=int, default=8, + parser.add_argument('--nproc-per-node', type=int, default=8, help='Number of GPUs (default: 8)') + parser.add_argument('--data-output-dir', default=None, + help='Optional processed data dir (default: system temp). Use /dev/shm on Linux for speed.') @@ - data_output_dir="/dev/shm", # Use RAM disk for speed + data_output_dir=(args.data_output_dir or os.getenv("TMPDIR") or "/tmp"),Add
import osback if you apply this change.
103-114: Catching broad Exception; prefer specific exceptions first.- except Exception as e: + except (ValueError, RuntimeError) as e: end_time = time.time() duration = end_time - start_time @@ - sys.exit(1) + sys.exit(1) + except Exception as e: + # Fallback catch-all to surface unexpected errors cleanly + print("=" * 50) + print(f"Unexpected error: {e}") + sys.exit(1)examples/scripts/osft_gpt_oss_example.py (5)
1-1: Shebang mismatch with file perms.Either make the file executable or drop the shebang.
-#!/usr/bin/env python3
22-22: Remove unused import.
datetimeis unused.-from datetime import datetime
28-52: Make checkpoint discovery more robust (mtime and dir check).-def find_most_recent_checkpoint(output_dir): +def find_most_recent_checkpoint(output_dir): @@ - checkpoint_dirs = glob.glob(checkpoint_pattern) + checkpoint_dirs = [p for p in glob.glob(checkpoint_pattern) if os.path.isdir(p)] @@ - # Find the most recently created checkpoint - most_recent_checkpoint = max(checkpoint_dirs, key=os.path.getctime) + # Pick most recently modified checkpoint (portable across filesystems) + most_recent_checkpoint = max(checkpoint_dirs, key=os.path.getmtime)
118-122: Avoid hardcoding/dev/shm; expose CLI override with safe default.@@ def main(): - parser.add_argument('--unmask-messages', action='store_true', default=False, + parser.add_argument('--unmask-messages', action='store_true', default=False, help='Unmask messages during training (default: False)') + parser.add_argument('--data-output-dir', default=None, + help='Optional processed data dir (default: system temp). Use /dev/shm on Linux for speed.') @@ - 'data_output_dir': "/dev/shm", # Use RAM disk for speed + 'data_output_dir': (args.data_output_dir or os.getenv("TMPDIR") or "/tmp"),Add
import osat top if removed.
159-171: Catching broad Exception; prefer specific exceptions first.- except Exception as e: + except (ValueError, RuntimeError) as e: end_time = time.time() duration = end_time - start_time @@ - sys.exit(1) + sys.exit(1) + except Exception as e: + print("=" * 50) + print(f"Unexpected error: {e}") + sys.exit(1)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
examples/README.md(2 hunks)examples/docs/osft_usage.md(3 hunks)examples/scripts/osft_gpt_oss_example.py(1 hunks)examples/scripts/sft_gpt_oss_example.py(1 hunks)pyproject.toml(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
examples/scripts/sft_gpt_oss_example.py (1)
src/training_hub/algorithms/sft.py (1)
sft(177-249)
examples/scripts/osft_gpt_oss_example.py (2)
src/training_hub/algorithms/osft.py (1)
osft(439-500)examples/scripts/sft_gpt_oss_example.py (1)
main(26-113)
🪛 Ruff (0.12.2)
examples/scripts/sft_gpt_oss_example.py
1-1: Shebang is present but file is not executable
(EXE001)
61-61: Local variable result is assigned to but never used
Remove assignment to unused variable result
(F841)
75-75: Probable insecure usage of temporary file or directory: "/dev/shm"
(S108)
103-103: Do not catch blind exception: Exception
(BLE001)
examples/scripts/osft_gpt_oss_example.py
1-1: Shebang is present but file is not executable
(EXE001)
46-46: Avoid specifying long messages outside the exception class
(TRY003)
119-119: Probable insecure usage of temporary file or directory: "/dev/shm"
(S108)
159-159: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (8)
examples/README.md (3)
27-27: Formatting-only change.No action needed.
29-30: New SFT GPT‑OSS entry looks good.Path and description align with the added script.
62-63: New OSFT GPT‑OSS entry looks good.Consistent with the OSFT example script name.
pyproject.toml (4)
32-32: No action required — PyPI releases exist for these min versions. Verified fsspec (2025.9.0) and regex (2025.9.1) exist on PyPI, so keepingfsspec>=2025.0andregex>=2025.0in pyproject.toml is valid.
45-45: Confirm package name:attrvsattrs.pyproject.toml lists
attr>=0.3.2(pyproject.toml: line 45). Repo search returned no imports ofattr/attrs; verify the intended package and update pyproject.toml or the code to match.
21-21: Confirm instructlab-training (>=0.12.0) doesn't require a narrower transformers range.Wheel metadata shows instructlab-training depends on transformers>=4.41.2, and upstream issues report breakage with transformers >4.51.0 — verify 0.12.0's Requires‑Dist/pyproject so your top-level transformers pin won't cause resolver conflicts. (wheelodex.org)
55-61: Confirm flash‑attn / PyTorch / CUDA compatibilitypyproject.toml extras ('cuda') include torch>=2.6.0 and flash-attn>=2.8 — confirm prebuilt flash-attn wheels exist for the exact PyTorch+CUDA you target (many flash-attn releases on PyPI are source-only and require ninja/CUDA to build; prebuilt wheels are provided separately, e.g. flash-attn-wheels). (pypi.org)
Location: pyproject.toml lines 54–61.
examples/docs/osft_usage.md (1)
90-96: Confirm backend support for osft_memory_efficient_initDocs/examples add osft_memory_efficient_init, but a repo search shows it only appears in examples/docs and examples/scripts; MiniTrainerOSFTBackend.execute_training’s renames dict (src/training_hub/algorithms/osft.py) does not include this key and there’s no other handling — verify the backend accepts this kwarg (or add a rename/TrainingArgs field and/or validation entry) to avoid runtime "unexpected keyword" or validation errors.
| "rhai-innovation-mini-trainer>=0.2.0", | ||
| "torch>=2.6.0", | ||
| "transformers>=4.55.0", | ||
| "numba>=0.50", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Numba/llvmlite/numpy compatibility risk.
The trio numba>=0.50, llvmlite>=0.42, and numpy<2.3 is likely to produce resolver or ABI mismatches. Prefer matching Numba’s documented pairings (Numba pins llvmlite tightly and constrains NumPy).
- "numba>=0.50",
- "llvmlite>=0.42",
- "numpy>=1.26.4,<2.3",
+ # Align versions to avoid ABI conflicts (NumPy 2+ support)
+ "numba>=0.60,<0.61",
+ "llvmlite>=0.43,<0.44",
+ "numpy>=1.26.4,<2.3",Please validate against your CI matrix and the GPUs you target.
Also applies to: 38-39
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== locate pyproject.toml ==="
fd -HI --hidden pyproject.toml || true
for f in $(fd -HI --hidden pyproject.toml || true); do
echo "---- $f ----"
nl -ba "$f" | sed -n '1,240p'
done
echo
echo "=== Search for numba/llvmlite/numpy occurrences across repo ==="
rg -n --hidden -S -C2 '\b(numba|llvmlite|numpy)\b' || trueLength of output: 110
🌐 Web query:
Numba llvmlite compatibility matrix
Numba supported NumPy versions site:numba.pydata.org
Numba 0.60 release notes llvmlite
llvmlite 0.43 release notes
💡 Result:
Here are the sources you requested and the key compatibility facts:
- llvmlite v0.43.0 release notes (includes date and highlights). [1] (llvmlite.readthedocs.io)
- llvmlite compatibility / historical matrix (README) — shows which LLVM versions each llvmlite series supports. [2] (github.com)
- Numba 0.60.0 release notes (highlights) — Numba 0.60.0 released 2024-06-13 and documents NumPy/llvmlite compatibility. See the release notes for details. [3] (numba.readthedocs.io)
- Numba 0.60 installation / version-support table (canonical compatibility matrix): Numba 0.60.0 — Python 3.9–<3.13, NumPy 1.22 ≤ version < 1.27 (and == 2.0), llvmlite 0.43.x, LLVM 14.x. [4] (numba.readthedocs.io)
- For archived Numba docs showing supported NumPy ranges for earlier Numba releases (examples: 0.48, 0.43, etc.). Use these pages if you need per-release NumPy support details. [5][6] (numba.pydata.org)
If you want, I can extract the full compatibility table (Numba release → supported Python, NumPy, llvmlite, LLVM) from the 0.60 docs into a CSV or formatted table.
Citations:
- 1: https://llvmlite.readthedocs.io/en/v0.43.0/release-notes.html?utm_source=openai
- 2: https://github.com/numba/llvmlite/blob/main/README.rst?utm_source=openai
- 3: https://numba.readthedocs.io/en/latest/release/0.60.0-notes.html?utm_source=openai
- 4: https://numba.readthedocs.io/en/0.60.0/user/installing.html?utm_source=openai
- 5: https://numba.pydata.org/numba-doc/0.48.0/user/installing.html?utm_source=openai
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== locate pyproject.toml ==="
fd -HI --hidden pyproject.toml || true
for f in $(fd -HI --hidden pyproject.toml || true); do
echo "---- $f ----"
sed -n '1,240p' "$f" || true
done
echo
echo "=== Search for numba/llvmlite/numpy occurrences across repo ==="
rg -n --hidden -S -C2 '\b(numba|llvmlite|numpy)\b' || true
echo
echo "=== Search for CI/workflows that reference numba/llvmlite/numpy or GPUs ==="
rg -n --hidden -S --glob '.github/**' 'numba|llvmlite|numpy|cuda|gpu|nvidia|rocm' || true
echo
echo "=== List .github/workflows files (first 240 lines) ==="
fd -HI --hidden .github/workflows || true
for f in $(fd -HI --hidden .github/workflows || true); do
echo "---- $f ----"
sed -n '1,240p' "$f" || true
doneLength of output: 3531
Align Numba/llvmlite with supported NumPy to avoid ABI/resolver conflicts.
pyproject.toml currently has "numba>=0.50" (line 21) and "llvmlite>=0.42" (line 38) while NumPy is ">=1.26.4,<2.3". Numba releases tightly pair with specific llvmlite and NumPy ranges (e.g., Numba 0.60 ↔ llvmlite 0.43); loose/mismatched ranges risk resolver or ABI failures. Pin numba/llvmlite to a tested pairing or adjust NumPy bounds and validate on CI + target GPUs.
- "numba>=0.50",
- "llvmlite>=0.42",
- "numpy>=1.26.4,<2.3",
+ # Align versions to avoid ABI conflicts (NumPy 2+ support)
+ "numba>=0.60,<0.61",
+ "llvmlite>=0.43,<0.44",
+ "numpy>=1.26.4,<2.3",pyproject.toml: lines 21, 38.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "numba>=0.50", | |
| # Align versions to avoid ABI conflicts (NumPy 2+ support) | |
| "numba>=0.60,<0.61", | |
| "llvmlite>=0.43,<0.44", | |
| "numpy>=1.26.4,<2.3", |
🤖 Prompt for AI Agents
pyproject.toml lines 21 and 38: the numba and llvmlite ranges are too loose
compared to the pinned NumPy range and can cause ABI/resolver conflicts; update
these pins to a tested pairing (for example pin numba to the release that
matches your llvmlite and NumPy matrix, e.g. numba 0.60.x with llvmlite 0.43.x)
or alternatively relax/expand the NumPy bounds to match the chosen
numba/llvmlite combo; change the version specifiers to explicit, compatible
ranges (or exact patch pins) and run CI matrix tests on target Python/NumPy/GPU
combos to validate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
examples/scripts/osft_gpt_oss_example.py (1)
41-51: Checkpoint discovery: prefer mtime or numeric sorting; keep glob fixGood that the pattern is now
samples_*(matches prior review). Usingctimecan be unreliable across filesystems; usemtimeor parse the numeric suffix.- most_recent_checkpoint = max(checkpoint_dirs, key=os.path.getctime) + # Prefer modification time; alternatively parse the numeric suffix after 'samples_' + most_recent_checkpoint = max(checkpoint_dirs, key=os.path.getmtime)pyproject.toml (1)
48-48: Do not ship pytest as a runtime dependencyMove
pytest>=8.0to a dev/test extra. Shipping it increases install size and build time for end users.- "pytest>=8.0" + # moved to optional dev/test dependenciesThen add under
[project.optional-dependencies]:dev = [ "ipykernel", - "ipython" + "ipython", + "pytest>=8.0" ]
🧹 Nitpick comments (14)
examples/README.md (2)
29-29: Add a short note about CUDA extras next to GPT-OSS 20B scriptReaders may miss they need
pip install training-hub[cuda]for these scripts.Apply this minimal addition after the line:
- [SFT with GPT-OSS 20B](scripts/sft_gpt_oss_example.py) - Single-node multi-GPU training example with GPT-OSS 20B + [SFT with GPT-OSS 20B](scripts/sft_gpt_oss_example.py) - Single-node multi-GPU training example with GPT-OSS 20B (requires `pip install training-hub[cuda]`)
62-62: Mirror the CUDA extras hint for OSFT scriptSame rationale as SFT entry.
- [OSFT with GPT-OSS 20B](scripts/osft_gpt_oss_example.py) - Single-node multi-GPU training example with GPT-OSS 20B + [OSFT with GPT-OSS 20B](scripts/osft_gpt_oss_example.py) - Single-node multi-GPU training example with GPT-OSS 20B (requires `pip install training-hub[cuda]`)examples/docs/osft_usage.md (3)
172-173: Parameter reference: clarify default and compatibilityAdd “default: False” and, if applicable, note any backend/version requirements (e.g., requires mini-trainer ≥X.Y).
- - `osft_memory_efficient_init` (bool): Enable memory-efficient initialization to reduce memory usage during model loading (recommended for OOMs) + - `osft_memory_efficient_init` (bool, default: False): Enable memory‑efficient initialization to reduce memory usage during model loading (recommended if you hit OOMs). Requires backend support.
271-271: Best Practices: small clarity tweakCall out explicitly that it affects load-time memory only; training-time memory still governed by
max_tokens_per_gpu.-2. **Memory Management**: OSFT doesn't reduce memory requirements compared to SFT, so adjust `max_tokens_per_gpu` accordingly. For memory-constrained environments or OOMs during model loading, set `osft_memory_efficient_init=True`. +2. **Memory Management**: OSFT doesn't reduce training‑time memory vs SFT, so adjust `max_tokens_per_gpu` accordingly. For OOMs during model loading specifically, set `osft_memory_efficient_init=True`.
93-95: Document default and scope for osft_memory_efficient_initosft_memory_efficient_init is referenced only in docs/examples and is not declared on the public osft() API — document that it defaults to False and that it’s forwarded to backends via **kwargs unless you make it an explicit osft(...) parameter.
Occurrences: examples/docs/osft_usage.md (lines 93, 172–173, 271–272); examples/scripts/osft_gpt_oss_example.py (line 125). Public API: src/training_hub/algorithms/osft.py (def osft around lines 434–444) — parameter not present.examples/scripts/sft_gpt_oss_example.py (4)
1-1: Shebang present but file not executableEither make the file executable in the repo (
chmod +x) or drop the shebang.-#!/usr/bin/env python3 +# (shebang removed; invoke with `python sft_gpt_oss_example.py ...`)
17-22: Remove unused imports
osanddatetimeare unused.-import os import sys import time -from datetime import datetime import argparse
61-90: Avoid assigning unusedresult; narrow exception or re-raise with traceback
- Drop the unused assignment.
- Consider catching expected exception types or re-raising after logging.
- result = sft( + sft( @@ - except Exception as e: + except Exception as e: # consider narrowing to expected exceptions end_time = time.time() duration = end_time - start_time
75-79: Hard-coded/dev/shmcan be brittleExpose as a CLI flag with a safe default (e.g., omit or use
tempfile.gettempdir()).- data_output_dir="/dev/shm", # Use RAM disk for speed + # Consider exposing via --data-output-dir; default None lets backend decide + # data_output_dir="/dev/shm",examples/scripts/osft_gpt_oss_example.py (4)
1-1: Shebang present but file not executableSame as SFT script: make executable or remove shebang.
-#!/usr/bin/env python3 +# (shebang removed; invoke with `python osft_gpt_oss_example.py ...`)
19-25: Remove unused import
datetimeis unused.-from datetime import datetime import argparse import glob
119-121: Hard-coded/dev/shmcan be brittleSame suggestion as SFT script: make it configurable or omit.
- 'data_output_dir': "/dev/shm", # Use RAM disk for speed + # 'data_output_dir': "/dev/shm", # Consider exposing via CLI; default None is safer
159-171: Catching broad ExceptionConsider narrowing to expected exceptions and/or printing a traceback before exiting.
- except Exception as e: + except Exception as e: # consider narrowing to expected exceptions end_time = time.time() duration = end_time - start_timepyproject.toml (1)
40-47: Duplicate dependency: filelock listed twiceKeep the stricter one (
>=3.19.1) and drop the other to avoid confusion.- "filelock>=3.0", @@ - "filelock>=3.19.1", + "filelock>=3.19.1",
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
README.md(1 hunks)examples/README.md(2 hunks)examples/docs/osft_usage.md(3 hunks)examples/scripts/osft_gpt_oss_example.py(1 hunks)examples/scripts/sft_gpt_oss_example.py(1 hunks)pyproject.toml(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
examples/scripts/osft_gpt_oss_example.py (2)
src/training_hub/algorithms/osft.py (1)
osft(439-500)examples/scripts/sft_gpt_oss_example.py (1)
main(26-113)
examples/scripts/sft_gpt_oss_example.py (2)
src/training_hub/algorithms/sft.py (1)
sft(177-249)examples/scripts/osft_gpt_oss_example.py (1)
main(54-171)
🪛 Ruff (0.12.2)
examples/scripts/osft_gpt_oss_example.py
1-1: Shebang is present but file is not executable
(EXE001)
46-46: Avoid specifying long messages outside the exception class
(TRY003)
119-119: Probable insecure usage of temporary file or directory: "/dev/shm"
(S108)
159-159: Do not catch blind exception: Exception
(BLE001)
examples/scripts/sft_gpt_oss_example.py
1-1: Shebang is present but file is not executable
(EXE001)
61-61: Local variable result is assigned to but never used
Remove assignment to unused variable result
(F841)
75-75: Probable insecure usage of temporary file or directory: "/dev/shm"
(S108)
103-103: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (7)
README.md (1)
9-9: Support Matrix: OSFT marked Implemented — looks goodStatus change aligns with the new examples and docs. No further action from this file.
examples/README.md (1)
27-27: Formatting-only tweak — OKNo behavioral impact.
examples/scripts/sft_gpt_oss_example.py (1)
80-82: Checkpointing choice: confirm behavior matches backendIf
accelerate_full_state_at_epoch=False, resume behavior changes. Confirm that’s intended for users copying this example.examples/scripts/osft_gpt_oss_example.py (1)
63-78: Confirm HF model idEnsure
--model-pathdefaultopenai/gpt-oss-20bmatches the actual HF repo name.pyproject.toml (3)
54-61: Extras naming: confirm docs align withcudaextraThe README installs
training-hub[cuda], which matches this extra. If any docs mention agpt-ossextra, update them for consistency.
45-46: Verifyattrvsattrsin pyproject.tomlpyproject.toml lists "attr>=0.3.2" (pyproject.toml: lines 45–46). The canonical package is usually "attrs" — confirm which package you intend to require and update pyproject.toml if necessary. Repository search returned no import matches for either; manual verification required.
17-23: Version alignment sanity check for core training stackLarge version bumps in pyproject.toml (instructlab-training, rhai-innovation-mini-trainer, transformers, datasets, numpy upper bound) can conflict with torch/flash-attn/bitsandbytes — ensure CI uses a pinned lockfile or constraints file and verify resolver stability across Linux/macOS/GPU images.
Automated search returned no output; run locally/CI to confirm:
- rg -n -C2 -S 'instructlab-training|rhai-innovation-mini-trainer|transformers|torch|flash-attn|bitsandbytes|numpy|accelerate|peft' pyproject.toml
- ls | rg 'poetry.lock|requirements.txt|constraints.txt|Pipfile.lock' || true
- python -m pip install -U pip && python -m pip install 'training-hub[cuda] @ .'
Adds two new example scripts, updated documentation, and reorganized dependencies for GPT-OSS support
Summary by CodeRabbit
New Features
Documentation
Chores