add OSFT notebook for different batch sizes #5

RobotSail · 2025-09-15T18:23:51Z

This PR adds a notebook which showcases how the batch size is a function of the dataset size.

Summary by CodeRabbit

Documentation
- Added a Jupyter notebook "OSFT Dataset Scaling Guide" in examples/notebooks.
- Explains batch-size scaling for small (1K), medium (10K), and large (100K+) datasets with a common configuration and dataset-specific examples.
- Computes steps-per-epoch and total steps; includes example training invocations and command-line snippets.
- Provides a summary table, hyperparameter tuning and experimentation strategies, and key takeaways.
- Uses illustrative paths/values only; no changes to code or public APIs.

coderabbitai · 2025-09-15T18:23:59Z

Walkthrough

Adds a new Jupyter notebook example that demonstrates scaling OSFT training hyperparameters with dataset size (Small/Medium/Large), includes common configuration, per-dataset example configs with computed steps, training invocation skeletons, strategy notes, and a summary table. No code or API changes elsewhere.

Changes

Cohort / File(s)	Summary of Changes
Examples / Notebook `examples/notebooks/osft_dataset_scaling_guide.ipynb`	New Jupyter notebook providing guidance on batch-size and hyperparameter scaling for OSFT across dataset sizes (1K, 10K, 100K+). Includes a common config block, per-dataset example configs (data_path, effective_batch_size, warmup_steps, use_case), computed metrics (steps per epoch, total steps), sample training invocations, a batch-size summary table, tuning strategy, and practical experimentation notes. No library/API or exported declarations changed.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I’m a rabbit in a notebook glade,
Counting batches in the shade.
From tiny sets to vast array,
Warmups, epochs guide the play.
Hop—configure, run, then cheer:
Training scales, the path is clear. 🐇

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title clearly states that a new OSFT notebook focused on different batch sizes is being added and directly reflects the main change in the pull request without extraneous information.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch add-batch-nb

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f3b8a26 and 836dadb.

📒 Files selected for processing (1)

examples/notebooks/osft_dataset_scaling_guide.ipynb (1 hunks)

🔇 Additional comments (2)

examples/notebooks/osft_dataset_scaling_guide.ipynb (2)
54-94: Clear notebook outputs before committing

Cells like Line 73 (and subsequent code cells) still have non-null execution_count values and captured stdout, so the notebook carries environment-specific artifacts (e.g., warnings, prints). Please clear all outputs before committing. Running jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace examples/notebooks/osft_dataset_scaling_guide.ipynb (or enabling nbstripout) will keep diffs clean.

73-254: Fix step math and warmup guidance

Line 137 (and the analogous logic in Lines 190 and 243) uses floor division, which undercounts steps whenever the batch size doesn’t evenly divide the dataset. As written, the “~” values are materially wrong (e.g., the 1K example should report 63 steps/epoch and 189 total steps). Additionally, Line 132 hardcodes warmup_steps=50, but the large dataset example ends up with warmup_steps=500 while total_steps=291, which is invalid in most trainers. Please derive steps with math.ceil, recompute total steps, and calculate warmup from a ratio to keep it ≤ total. Sample fix:
+# Standard library
+import math
+
 # Model configuration - using Llama 3.1 8B Instruct
 MODEL_PATH = "meta-llama/Meta-Llama-3.1-8B-Instruct"
@@
 small_dataset_config = {
     "dataset_size": "1K samples",
     "data_path": "/path/to/your/small_dataset_1k_samples.jsonl",  # Replace with your path
     "effective_batch_size": 16,  # Small batch size for more gradient updates
-    "warmup_steps": 50,          # Quick warmup for small dataset
     "use_case": "Domain-specific terminology or specialized knowledge"
 }
 
 # Calculate training dynamics
-steps_per_epoch_1k = 1000 // small_dataset_config["effective_batch_size"]
-total_steps_1k = steps_per_epoch_1k * NUM_EPOCHS
+steps_per_epoch_1k = math.ceil(1000 / small_dataset_config["effective_batch_size"])
+total_steps_1k = steps_per_epoch_1k * NUM_EPOCHS
+warmup_steps_1k = max(1, int(0.10 * total_steps_1k))
+small_dataset_config["warmup_steps"] = warmup_steps_1k
@@
 print(f"  Steps per Epoch: ~{steps_per_epoch_1k}")
 print(f"  Total Training Steps: ~{total_steps_1k}")
+print(f"  Warmup Steps: ~{warmup_steps_1k}")
@@
 medium_dataset_config = {
@@
-    "warmup_steps": 100,          # Standard warmup
@@
-steps_per_epoch_10k = 10000 // medium_dataset_config["effective_batch_size"]
-total_steps_10k = steps_per_epoch_10k * NUM_EPOCHS
+steps_per_epoch_10k = math.ceil(10000 / medium_dataset_config["effective_batch_size"])
+total_steps_10k = steps_per_epoch_10k * NUM_EPOCHS
+warmup_steps_10k = max(1, int(0.10 * total_steps_10k))
+medium_dataset_config["warmup_steps"] = warmup_steps_10k
@@
 print(f"  Total Training Steps: ~{total_steps_10k}")
+print(f"  Warmup Steps: ~{warmup_steps_10k}")
@@
 large_dataset_config = {
@@
-    "warmup_steps": 500,           # Extended warmup for large batch
@@
-steps_per_epoch_100k = 100000 // large_dataset_config["effective_batch_size"]
-total_steps_100k = steps_per_epoch_100k * NUM_EPOCHS
+steps_per_epoch_100k = math.ceil(100000 / large_dataset_config["effective_batch_size"])
+total_steps_100k = steps_per_epoch_100k * NUM_EPOCHS
+warmup_steps_100k = max(1, int(0.10 * total_steps_100k))
+large_dataset_config["warmup_steps"] = warmup_steps_100k
@@
 print(f"  Total Training Steps: ~{total_steps_100k}")
+print(f"  Warmup Steps: ~{warmup_steps_100k}")
Once the calculations are corrected, the summary table will automatically show the updated values.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

examples/notebooks/osft_dataset_scaling_guide.ipynb (3)

233-246: Remove extraneous f-strings (no placeholders).

These trigger Ruff F541 and add noise. Keep f-strings only where {} interpolation occurs.

-print(f"result = osft(")
+print("result = osft(")
-print(f"    # Model and data")
+print("    # Model and data")
-print(f"    ckpt_output_dir='/path/to/checkpoints/osft_1k_dataset',")
+print("    ckpt_output_dir='/path/to/checkpoints/osft_1k_dataset',")
-print(f"    ")
+print("    ")
-print(f"    # OSFT parameters")
+print("    # OSFT parameters")
-print(f"    ")
+print("    ")
-print(f"    # Batch size scaled for small dataset")
+print("    # Batch size scaled for small dataset")
-print(f"    ")
+print("    ")
-print(f"    # Other training parameters")
+print("    # Other training parameters")
-print(f"    ")
+print("    ")
-print(f"    # Distributed training")
+print("    # Distributed training")
-print(f")")
+print(")")
@@
-print(f"result = osft(")
+print("result = osft(")
-print(f"    # Model and data")
+print("    # Model and data")
-print(f"    ckpt_output_dir='/path/to/checkpoints/osft_10k_dataset',")
+print("    ckpt_output_dir='/path/to/checkpoints/osft_10k_dataset',")
-print(f"    ")
+print("    ")
-print(f"    # OSFT parameters")
+print("    # OSFT parameters")
-print(f"    ")
+print("    ")
-print(f"    # Batch size scaled for medium dataset")
+print("    # Batch size scaled for medium dataset")
-print(f"    ")
+print("    ")
-print(f"    # Other training parameters")
+print("    # Other training parameters")
-print(f"    ")
+print("    ")
-print(f"    # Distributed training")
+print("    # Distributed training")
-print(f")")
+print(")")
@@
-print(f"result = osft(")
+print("result = osft(")
-print(f"    # Model and data")
+print("    # Model and data")
-print(f"    ckpt_output_dir='/path/to/checkpoints/osft_100k_dataset',")
+print("    ckpt_output_dir='/path/to/checkpoints/osft_100k_dataset',")
-print(f"    ")
+print("    ")
-print(f"    # OSFT parameters")
+print("    # OSFT parameters")
-print(f"    ")
+print("    ")
-print(f"    # Batch size scaled for large dataset")
+print("    # Batch size scaled for large dataset")
-print(f"    ")
+print("    ")
-print(f"    # Other training parameters")
+print("    # Other training parameters")
-print(f"    ")
+print("    ")
-print(f"    # Distributed training")
+print("    # Distributed training")
-print(f")")
+print(")")

Also applies to: 252-259, 361-374, 380-389, 491-504, 510-517

739-742: Use a portable kernelspec display_name.

“.venv” is machine-specific and breaks opening the notebook elsewhere. Prefer “Python 3” or similar.

109-116: Clarify effective_batch_size definition (global vs per-device).

Readers may confuse per-GPU micro-batch with global “effective_batch_size”. Add a short note and the formula: effective = per_device_batch_size × grad_accum × num_gpus. Optionally include helper fields for per_device and grad_accum.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7065187 and f3b8a26.

📒 Files selected for processing (1)

examples/notebooks/osft_dataset_scaling_guide.ipynb (1 hunks)

🧰 Additional context used

🪛 Ruff (0.12.2)

examples/notebooks/osft_dataset_scaling_guide.ipynb

68-68: f-string without any placeholders

Remove extraneous f prefix

(F541)

69-69: f-string without any placeholders

Remove extraneous f prefix

(F541)

72-72: f-string without any placeholders

Remove extraneous f prefix

(F541)

73-73: f-string without any placeholders

Remove extraneous f prefix

(F541)

74-74: f-string without any placeholders

Remove extraneous f prefix

(F541)

76-76: f-string without any placeholders

Remove extraneous f prefix

(F541)

77-77: f-string without any placeholders

Remove extraneous f prefix

(F541)

79-79: f-string without any placeholders

Remove extraneous f prefix

(F541)

80-80: f-string without any placeholders

Remove extraneous f prefix

(F541)

86-86: f-string without any placeholders

Remove extraneous f prefix

(F541)

87-87: f-string without any placeholders

Remove extraneous f prefix

(F541)

93-93: f-string without any placeholders

Remove extraneous f prefix

(F541)

126-126: f-string without any placeholders

Remove extraneous f prefix

(F541)

127-127: f-string without any placeholders

Remove extraneous f prefix

(F541)

130-130: f-string without any placeholders

Remove extraneous f prefix

(F541)

131-131: f-string without any placeholders

Remove extraneous f prefix

(F541)

132-132: f-string without any placeholders

Remove extraneous f prefix

(F541)

134-134: f-string without any placeholders

Remove extraneous f prefix

(F541)

135-135: f-string without any placeholders

Remove extraneous f prefix

(F541)

137-137: f-string without any placeholders

Remove extraneous f prefix

(F541)

138-138: f-string without any placeholders

Remove extraneous f prefix

(F541)

144-144: f-string without any placeholders

Remove extraneous f prefix

(F541)

145-145: f-string without any placeholders

Remove extraneous f prefix

(F541)

151-151: f-string without any placeholders

Remove extraneous f prefix

(F541)

184-184: f-string without any placeholders

Remove extraneous f prefix

(F541)

185-185: f-string without any placeholders

Remove extraneous f prefix

(F541)

188-188: f-string without any placeholders

Remove extraneous f prefix

(F541)

189-189: f-string without any placeholders

Remove extraneous f prefix

(F541)

190-190: f-string without any placeholders

Remove extraneous f prefix

(F541)

192-192: f-string without any placeholders

Remove extraneous f prefix

(F541)

193-193: f-string without any placeholders

Remove extraneous f prefix

(F541)

195-195: f-string without any placeholders

Remove extraneous f prefix

(F541)

196-196: f-string without any placeholders

Remove extraneous f prefix

(F541)

202-202: f-string without any placeholders

Remove extraneous f prefix

(F541)

203-203: f-string without any placeholders

Remove extraneous f prefix

(F541)

209-209: f-string without any placeholders

Remove extraneous f prefix

(F541)

coderabbitai · 2025-09-15T18:34:16Z

examples/notebooks/osft_dataset_scaling_guide.ipynb

+      "execution_count": 1,
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "/mnt/7TB-a/osilkin/non-critical-changes/training_hub/.venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+            "  from .autonotebook import tqdm as notebook_tqdm\n"
+          ]
+        }
+      ],


💡 Verification agent

🧩 Analysis chain

Clear notebook outputs and warnings before committing.

Committed outputs include environment-specific warnings (e.g., tqdm/ipywidgets) and make diffs noisy.

Run this read-only check:

To clear locally (not run here): jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace examples/notebooks/osft_dataset_scaling_guide.ipynb
Add a pre-commit hook like nbstripout for future commits.

Also applies to: 147-160, 191-226, 275-289, 321-355, 405-418, 449-483, 533-551, 651-676

🏁 Script executed:

#!/bin/bash # Show cells with non-empty outputs and non-null execution_count jq -r ' .cells[] | {ec: .execution_count, out: (.outputs|length)} | select(.out>0 or (.ec != null)) ' examples/notebooks/osft_dataset_scaling_guide.ipynb | head -n 20

Length of output: 189

🏁 Script executed:

#!/bin/bash set -euo pipefail nb="examples/notebooks/osft_dataset_scaling_guide.ipynb" if [ ! -f "$nb" ]; then echo "MISSING: $nb" exit 0 fi python3 - <<'PY' import json,sys with open("examples/notebooks/osft_dataset_scaling_guide.ipynb","r",encoding="utf-8") as f: nb=json.load(f) cells=nb.get("cells",[]) results=[] for idx,cell in enumerate(cells): ec=cell.get("execution_count", None) outs=cell.get("outputs", []) if (outs and len(outs)>0) or ec is not None: sample=None if outs: o=outs[0] if isinstance(o.get("text"), list): sample="".join(o.get("text"))[:200] else: sample=str(o.get("text") or o.get("ename") or "")[:200] results.append({"cell_index": idx, "execution_count": ec, "outputs": len(outs), "sample_excerpt": sample}) print(json.dumps(results, indent=2)) PY

Length of output: 3105

Clear notebook outputs and remove environment-specific warnings before committing.

examples/notebooks/osft_dataset_scaling_guide.ipynb has executed outputs in cells 3, 5, 7, 8, 10, 11, 13, 14, 16, 20 — cell 3 contains a TqdmWarning about ipywidgets. Run locally: jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace examples/notebooks/osft_dataset_scaling_guide.ipynb and add nbstripout (or similar) as a pre-commit hook.

🤖 Prompt for AI Agents

In examples/notebooks/osft_dataset_scaling_guide.ipynb around lines 55 to 66 the notebook contains executed outputs (including a TqdmWarning about missing ipywidgets) and environment-specific stderr output; clear all cell outputs in the notebook (e.g., run locally: jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace examples/notebooks/osft_dataset_scaling_guide.ipynb) and save the cleaned file, then add/enable a notebook output cleaner such as nbstripout (or a pre-commit hook that runs nbconvert or nbstripout) to prevent committing execution outputs and environment-specific warnings in the future.

coderabbitai · 2025-09-15T18:34:16Z

examples/notebooks/osft_dataset_scaling_guide.ipynb

+        "# Import training_hub for OSFT training\n",
+        "from training_hub import osft\n",
+        "\n",
+        "# Standard library imports\n",
+        "import os\n",
+        "from datetime import datetime\n"
+      ]


💡 Verification agent

🧩 Analysis chain

Warmup steps can exceed total steps; compute dynamically and fix step math (ceil).

Large dataset example sets warmup_steps=500 while total_steps=291; that’s invalid in most trainers. Small/medium also use unusually high warmups vs your 5–10% guidance.

Also, steps/epoch currently use floor division and undercount when dataset_size % batch_size != 0.

Apply this diff to:

add math,

compute steps via ceil,

derive warmup as 10% of total (bounded),

print total GPUs (nodes×gpus),

drop hardcoded warmups in configs.

@@ -# Import training_hub for OSFT training -from training_hub import osft - -# Standard library imports -import os -from datetime import datetime +# Import training_hub for OSFT training (optional) +try: + from training_hub import osft # noqa: F401 +except Exception: + osft = None + print("Note: training_hub is not installed; showing printed configs only.") + +# Standard library imports +import math @@ -print(f" GPUs: {NPROC_PER_NODE}") +print(f" GPUs: {NPROC_PER_NODE * NNODES}") @@ small_dataset_config = { "dataset_size": "1K samples", "data_path": "/path/to/your/small_dataset_1k_samples.jsonl", # Replace with your path "effective_batch_size": 16, # Small batch size for more gradient updates - "warmup_steps": 50, # Quick warmup for small dataset "use_case": "Domain-specific terminology or specialized knowledge" } @@ -steps_per_epoch_1k = 1000 // small_dataset_config["effective_batch_size"] -total_steps_1k = steps_per_epoch_1k * NUM_EPOCHS +steps_per_epoch_1k = math.ceil(1000 / small_dataset_config["effective_batch_size"]) +total_steps_1k = steps_per_epoch_1k * NUM_EPOCHS +warmup_steps_1k = max(1, int(0.10 * total_steps_1k)) @@ -print(f" warmup_steps={small_dataset_config['warmup_steps']},") +print(f" warmup_steps={warmup_steps_1k},") @@ medium_dataset_config = { "dataset_size": "10K samples", "data_path": "/path/to/your/medium_dataset_10k_samples.jsonl", # Replace with your path "effective_batch_size": 128, # Moderate batch size for efficiency - "warmup_steps": 100, # Standard warmup "use_case": "Domain adaptation or moderate-scale instruction tuning" } @@ -steps_per_epoch_10k = 10000 // medium_dataset_config["effective_batch_size"] -total_steps_10k = steps_per_epoch_10k * NUM_EPOCHS +steps_per_epoch_10k = math.ceil(10000 / medium_dataset_config["effective_batch_size"]) +total_steps_10k = steps_per_epoch_10k * NUM_EPOCHS +warmup_steps_10k = max(1, int(0.10 * total_steps_10k)) @@ -print(f" warmup_steps={medium_dataset_config['warmup_steps']},") +print(f" warmup_steps={warmup_steps_10k},") @@ large_dataset_config = { "dataset_size": "100K samples", "data_path": "/path/to/your/large_dataset_100k_samples.jsonl", # Replace with your path "effective_batch_size": 1024, # Large batch size for efficiency - "warmup_steps": 500, # Extended warmup for large batch "use_case": "Large-scale instruction tuning or comprehensive domain coverage" } @@ -steps_per_epoch_100k = 100000 // large_dataset_config["effective_batch_size"] -total_steps_100k = steps_per_epoch_100k * NUM_EPOCHS +steps_per_epoch_100k = math.ceil(100000 / large_dataset_config["effective_batch_size"]) +total_steps_100k = steps_per_epoch_100k * NUM_EPOCHS +warmup_steps_100k = max(1, int(0.10 * total_steps_100k)) @@ -print(f" warmup_steps={large_dataset_config['warmup_steps']},") +print(f" warmup_steps={warmup_steps_100k},")

Also applies to: 124-131, 163-171, 172-175, 229-260, 292-300, 301-304, 358-389, 421-428, 430-433, 487-517, 554-566

🏁 Script executed:

#!/bin/bash set -euo pipefail nb="examples/notebooks/osft_dataset_scaling_guide.ipynb" echo "CHECK: repository root files (top-level)" ls -la | sed -n '1,120p' || true echo if [ ! -f "$nb" ]; then echo "FILE_MISSING: $nb not found" rg -n --hidden --no-ignore -S "osft_dataset_scaling_guide|osft_dataset_scaling" || true exit 0 fi echo "FOUND: $nb" echo echo "Searching for relevant keywords in examples/ ..." rg -n -S -C2 "training_hub|warmup_steps|steps_per_epoch|effective_batch_size|NPROC_PER_NODE|NNODES|math\.ceil|math|ceil|warmup_ratio|warmup_steps" examples || true echo echo "Extracting notebook code cells containing matches:" python - <<'PY' import json,sys,os f="examples/notebooks/osft_dataset_scaling_guide.ipynb" nb=json.load(open(f,'r',encoding='utf-8')) cells=nb.get('cells',[]) for i,c in enumerate(cells): if c.get('cell_type')!='code': continue src=''.join(c.get('source',[])) if any(k in src for k in ['training_hub','warmup_steps','steps_per_epoch','effective_batch_size','NPROC_PER_NODE','NNODES','math.ceil','math','ceil','warmup_ratio']): print("----- CELL",i,"-----") for ln,line in enumerate(src.splitlines(),1): print(f"{ln:03d}: {line}") PY

Length of output: 82911

Warmup steps can exceed total steps; compute dynamically and fix step math (ceil).

Confirmed: examples/notebooks/osft_dataset_scaling_guide.ipynb uses hardcoded warmup_steps (50 / 100 / 500), computes steps_per_epoch with floor division (//) and prints GPUs as NPROC_PER_NODE — this undercounts steps and allows warmup > total_steps.

Action: apply the suggested changes — guard the training_hub import, import math, compute steps_per_epoch with math.ceil(dataset_size / effective_batch_size), total_steps = steps_per_epoch * NUM_EPOCHS, derive warmup_steps = max(1, int(0.10 * total_steps)) (optionally cap), print total GPUs as NPROC_PER_NODE * NNODES, and remove hardcoded warmup values.

Locations to fix: examples/notebooks/osft_dataset_scaling_guide.ipynb (68-74, 118-131, 163-171, 172-175, 229-260, 292-300, 301-304, 358-389, 421-428, 430-433, 487-517, 554-566). Search the repo for other occurrences of literal warmup_steps and '//' step math and apply the same change.

🧰 Tools

🪛 Ruff (0.12.2)

68-68: f-string without any placeholders

Remove extraneous f prefix

(F541)

69-69: f-string without any placeholders

Remove extraneous f prefix

(F541)

72-72: f-string without any placeholders

Remove extraneous f prefix

(F541)

73-73: f-string without any placeholders

Remove extraneous f prefix

(F541)

74-74: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents

In examples/notebooks/osft_dataset_scaling_guide.ipynb (affecting lines ~68-74, 118-131, 163-171, 172-175, 229-260, 292-300, 301-304, 358-389, 421-428, 430-433, 487-517, 554-566), guard the training_hub import with a try/except or conditional import, add "import math", replace any floor division used to compute steps_per_epoch with steps_per_epoch = math.ceil(dataset_size / effective_batch_size), compute total_steps = steps_per_epoch * NUM_EPOCHS, derive warmup_steps = max(1, int(0.10 * total_steps)) (optionally cap if desired) and remove hardcoded warmup values, change printed GPU count to NPROC_PER_NODE * NNODES, and globally search the repo for literal warmup_steps and '//' step math to apply the same ceil-based calculation and removal of hardcoded warmup entries.

Maxusmusti · 2025-09-25T18:51:26Z

Remove lab multiphase comparison at top, just mention OSFT working with data at any scale
Remove random OSFT code print blocks with arbitrary params
Add quick note at bottom about max_tokens_per_gpu, and how it does not affect batch size

coderabbitai bot reviewed Sep 15, 2025

View reviewed changes

add OSFT notebook for different batch sizes

836dadb

RobotSail force-pushed the add-batch-nb branch from f3b8a26 to 836dadb Compare September 25, 2025 20:11

Maxusmusti approved these changes Oct 10, 2025

View reviewed changes

Maxusmusti merged commit 4b19ab3 into main Oct 16, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add OSFT notebook for different batch sizes #5

add OSFT notebook for different batch sizes #5

Uh oh!

RobotSail commented Sep 15, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 15, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 15, 2025

Uh oh!

coderabbitai bot Sep 15, 2025

Uh oh!

Maxusmusti commented Sep 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

add OSFT notebook for different batch sizes #5

add OSFT notebook for different batch sizes #5

Uh oh!

Conversation

RobotSail commented Sep 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

Maxusmusti commented Sep 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RobotSail commented Sep 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 15, 2025 •

edited

Loading