Adding documentation showcasing the estimated runtime for various models and training setups #22

mazam-lab · 2025-11-05T22:34:35Z

Still missing a few values and some additional plots.

Would like some feedback on the general direction of how the plots and tables are going to look as well as information on how to properly

(note that all the changes up to October 14th should already be reflected in this repo via the squashed commit that occurred in my last PR)

Summary by CodeRabbit

Documentation
- Added comprehensive runtime measurement guidance for model fine‑tuning: wall‑clock timings, per‑epoch and full‑run results, sanity checks, and data preprocessing notes.
- Includes comparative analyses (model and hardware variants), adjusted scenario breakdowns, tables/graphs, multipliers, variability guidance, and references for reproducible performance evaluation.

…ixing __init__ issues

…om coderabbit, some other documentation cleaning

Update python build action for 3.14 compatibility (Red-Hat-AI-Innovation-Team#21)

coderabbitai · 2025-11-05T22:34:43Z

Walkthrough

Added a new documentation file providing wall-clock runtime measurements for model fine-tuning, including experiment notes, per‑epoch timings, comparative analyses across hardware and model versions, dataset preprocessing details, and references to graphs and sanity checks. (49 words)

Changes

Cohort / File(s)	Change Summary
Runtime Measurement Documentation `examples/docs/runtime_estimates.md`	Added a new documentation file containing experiment defaults and notes, per‑epoch training time tables, hardware/model/dataset comparison sections (A100 vs H100, Granite 3.3 vs 4, SFT vs OSFT), multiplier/difference calculations, adjusted scenarios, dataset preprocessing and token/sequence considerations, sanity checks, and references to graphs/images.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Areas for attention:
- Verify numeric timing values and derived multipliers/differences.
- Confirm referenced images/graphs exist and paths are correct.
- Check dataset preprocessing descriptions, token/sequence assumptions, and any dataset/model links for accuracy.

Poem

🐰 I hopped through logs at dawn's first light,

Timings tallied, tables bright,
GPUs humming, epochs align,
Numbers neat, each row and line,
A tiny carrot of insight.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change in the PR: adding documentation for runtime estimates. It is specific, clear, and directly related to the new file being added.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

mazam-lab · 2025-11-07T18:22:07Z

All missing information has been, this is ready for a full PR

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (5)

examples/docs/runtime_estimates.md (5)

9-10: Fix unordered list indentation.

Markdown linting expects 2-space indentation for nested list items, but these lines use 4-space indentation.

Apply this diff to use consistent 2-space indentation:

 - The experiments were conducted by using the default settings provided by `sft_granite_example.py` and `osft_granite_example.py`
-    - For SFT, `max_tokens_per_gpu=25000` and `max_seq_len=20000`
-    - For OSFT, `max_tokens_per_gpu=10000`, `max_seq_len=4096`, and `unfreeze_rank_ratio=0.3`
+  - For SFT, `max_tokens_per_gpu=25000` and `max_seq_len=20000`
+  - For OSFT, `max_tokens_per_gpu=10000`, `max_seq_len=4096`, and `unfreeze_rank_ratio=0.3`
 - **Models**: Two models were tested, **Granite 3.3 8B**, and **Granite 4 Tiny Preview** (a Mixture-of-Experts model that also has 8B Parameters)
 - **Hardware**: Two different hardware configurations were tested, a server with **8x A100s**, and an Openshift cluster with **8x H100s**. 
 - **Datasets**: Two datasets were tested, a simple dataset in Table-GPT and a much larger and longer dataset in Bespoke-Stratos-17k.
-    - Please note that both datasets were obtained by downloading the dataset from HuggingFace and then extracting the .jsonl file. 
+  - Please note that both datasets were obtained by downloading the dataset from HuggingFace and then extracting the .jsonl file. 
 - All experiments were run for the first full epoch two times, with the displayed time being the average of the two times. 
-    - **Please be aware that time for later epochs may vary**
-    - On the A100 machine, the variation between the two runs was negligible, never more than 6 seconds. 
-    - The variation is a bit larger on the H100 machine, especially during the first run of a pod (the first result was discarded and reran if it varied significantly)
+  - **Please be aware that time for later epochs may vary**
+  - On the A100 machine, the variation between the two runs was negligible, never more than 6 seconds. 
+  - The variation is a bit larger on the H100 machine, especially during the first run of a pod (the first result was discarded and reran if it varied significantly)
 - The time measurement is calculated by using the timestamps logged during the training process in the above scripts
 - By default, OSFT makes use of Liger Kernels to improve memory usage and runtime. However, as of Nov 7th 2025, Liger Kernels currently don't have built-in support for Granite 4
-    - As a result, the script was modified for allow Liger Kernels to be disabled for certain experiments
-    - The tables will be updated once support for Liger Kernels is added. 
+  - As a result, the script was modified for allow Liger Kernels to be disabled for certain experiments
+  - The tables will be updated once support for Liger Kernels is added. 
 - Many of these tests had the checkpointing hardcoded to be disabled in the script (set `checkpoint_at_epoch=False` and `accelerate_full_state_at_epoch=False`)
-    - This does not appear to impact runtime of the actual training loop
-    - This was mostly done to conserve disk space due to checkpoints being very large (tens of GB per epoch), which can cause DiskPressure on OpenShift
+  - This does not appear to impact runtime of the actual training loop
+  - This was mostly done to conserve disk space due to checkpoints being very large (tens of GB per epoch), which can cause DiskPressure on OpenShift

Also applies to: 14-18, 21-25

113-119: Clarify table column headers for dataset/model breakdowns.

The header "Granite 3.3 Bespoke" and "Granite 4 Bespoke" for columns 3–4 is misleading given that column 2 contains Table-GPT statistics. Since line 109 notes that Granite 3.3 and 4 differ negligibly for Table-GPT, the table structure should more clearly indicate which column represents which dataset.

Revise the table header to be clearer:

-| Stat                 | Table-GPT   | Granite 3.3 Bespoke | Granite 4 Bespoke |
+| Stat                 | Table-GPT | Bespoke (Granite 3.3) | Bespoke (Granite 4) |

Or, if the differences matter, split into separate tables labeled "Table-GPT Stats" and "Bespoke Stats" to remove ambiguity.

150-151: Use hyphenated compound adjectives before nouns.

Grammar: "open source" should be "open-source" when used as a compound adjective modifying a noun.

Apply this diff:

-Granite 3.3 is an open source **8B Parameter Large Language** Instruct model
-Granite 4 is still in preview stages, for these runs we use Tiny Preview, which is an open source **7B Parameter Hybrid Mixture-of-Experts** Instruct Model
+Granite 3.3 is an open-source **8B Parameter Large Language** Instruct model
+Granite 4 is still in preview stages, for these runs we use Tiny Preview, which is an open-source **7B Parameter Hybrid Mixture-of-Experts** Instruct Model

Also, "8B Parameter" and "7B Parameter" should be hyphenated: 8B-Parameter and 7B-Parameter.

-Granite 3.3 is an open-source **8B Parameter Large Language** Instruct model
+Granite 3.3 is an open-source **8B-Parameter Large-Language** Instruct model
-Granite 4 is still in preview stages, for these runs we use Tiny Preview, which is an open-source **7B Parameter Hybrid Mixture-of-Experts** Instruct Model
+Granite 4 is still in preview stages, for these runs we use Tiny Preview, which is an open-source **7B-Parameter Hybrid Mixture-of-Experts** Instruct Model

181-181: Use a regular heading instead of bold text; improve text conciseness.

Line 181 uses bold emphasis to introduce a note, but this should be a proper heading (###). Additionally, "All of the" should be shortened to "All" for clarity.

Apply this diff:

-**All of the measured times are for a single trial only! They are NOT the average of multiple trials**
+### Note on Trial Counts
+
+All measured times are for a single trial only. They are NOT the average of multiple trials.

25-25: Replace vague intensifier with concrete descriptor.

"very large" is imprecise. Use a specific measurement or qualifier.

-    - This was mostly done to conserve disk space due to checkpoints being very large (tens of GB per epoch), which can cause DiskPressure on OpenShift
+  - This was mostly done to conserve disk space due to checkpoints being extremely large (tens of GB per epoch), which can cause DiskPressure on OpenShift

Alternatively, if space is the key constraint, rephrase to:

+  - This was mostly done to conserve disk space; checkpoints are tens of GB per epoch and can cause DiskPressure on OpenShift

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce5903a and c65a094.

⛔ Files ignored due to path filters (6)

examples/docs/osft_datasets.png is excluded by !**/*.png
examples/docs/osft_hardware.png is excluded by !**/*.png
examples/docs/osft_models.png is excluded by !**/*.png
examples/docs/sft_datasets.png is excluded by !**/*.png
examples/docs/sft_hardware.png is excluded by !**/*.png
examples/docs/sft_models.png is excluded by !**/*.png

📒 Files selected for processing (1)

examples/docs/runtime_estimates.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

examples/docs/runtime_estimates.md

[style] ~25-~25: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...rve disk space due to checkpoints being very large (tens of GB per epoch), which can cause...

(EN_WEAK_ADJECTIVE)

[grammar] ~150-~150: Use a hyphen to join words.
Context: ...3.3 vs Granite 4 Granite 3.3 is an open source 8B Parameter Large Language I...

(QB_NEW_EN_HYPHEN)

[grammar] ~151-~151: Use a hyphen to join words.
Context: ...ns we use Tiny Preview, which is an open source **7B Parameter Hybrid Mixture-of-...

(QB_NEW_EN_HYPHEN)

[style] ~181-~181: Consider removing “of” to be more concise
Context: ...to the full training cycle measures. **All of the measured times are for a single trial o...

(ALL_OF_THE)

🪛 markdownlint-cli2 (0.18.1)

examples/docs/runtime_estimates.md

9-9: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

10-10: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

14-14: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

16-16: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

17-17: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

18-18: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

21-21: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

22-22: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

24-24: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

25-25: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

57-57: Bare URL used

(MD034, no-bare-urls)

69-69: Link and image reference definitions should be needed
Duplicate link or image reference definition: "logo"

(MD053, link-image-reference-definitions)

91-91: Bare URL used

(MD034, no-bare-urls)

91-91: Bare URL used

(MD034, no-bare-urls)

125-125: Link and image reference definitions should be needed
Duplicate link or image reference definition: "logo"

(MD053, link-image-reference-definitions)

129-129: Link and image reference definitions should be needed
Duplicate link or image reference definition: "logo"

(MD053, link-image-reference-definitions)

150-150: Bare URL used

(MD034, no-bare-urls)

151-151: Bare URL used

(MD034, no-bare-urls)

157-157: Link and image reference definitions should be needed
Duplicate link or image reference definition: "logo"

(MD053, link-image-reference-definitions)

161-161: Link and image reference definitions should be needed
Duplicate link or image reference definition: "logo"

(MD053, link-image-reference-definitions)

181-181: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🔇 Additional comments (1)

examples/docs/runtime_estimates.md (1)

63-70: Verify that referenced image files exist in the repository.

The documentation references PNG files on GitHub (e.g., sft_hardware.png, osft_datasets.png). If these images are placeholders or not yet committed, the links will be broken for readers.

Please confirm that the following image files exist on the main branch at the paths specified:

examples/docs/sft_hardware.png

examples/docs/osft_hardware.png

examples/docs/sft_datasets.png

examples/docs/osft_datasets.png

examples/docs/sft_models.png

examples/docs/osft_models.png

If these files are not yet available, consider adding a note (e.g., "Graphs coming soon") or temporarily removing the image sections to avoid broken links in documentation.

Also applies to: 123-130, 155-162

examples/docs/runtime_estimates.md

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

examples/docs/runtime_estimates.md (1)

57-57: Wrap bare URLs in markdown link syntax

Bare URLs should be wrapped in markdown links for proper rendering and accessibility. This was flagged in a previous review and remains unaddressed.

Apply these diffs:

- The A100 and H100 both contain 80 GB of VRAM per GPUs, and both setups contain 8 GPUs. However, the H100 higher FLOPs and more cores than the A100. See Wikipedia: https://en.wikipedia.org/wiki/Hopper_(microarchitecture)#H100_accelerator_and_DGX_H100
+ The A100 and H100 both contain 80 GB of VRAM per GPUs, and both setups contain 8 GPUs. However, the H100 higher FLOPs and more cores than the A100. See [Wikipedia on H100](https://en.wikipedia.org/wiki/Hopper_(microarchitecture)#H100_accelerator_and_DGX_H100)

- The two datasets used were Table-GPT (https://huggingface.co/datasets/LipengCS/Table-GPT) and Bespoke-Stratos-17k (https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k).
+ The two datasets used were [Table-GPT](https://huggingface.co/datasets/LipengCS/Table-GPT) and [Bespoke-Stratos-17k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k).

- Granite 3.3 is an open source **8B Parameter Large Language** Instruct model https://huggingface.co/ibm-granite/granite-3.3-8b-instruct
- Granite 4 is still in preview stages, for these runs we use Tiny Preview, which is an open source **7B Parameter Hybrid Mixture-of-Experts** Instruct Model https://huggingface.co/ibm-granite/granite-4.0-tiny-preview
+ Granite 3.3 is an open source **8B Parameter Large Language** Instruct model [on HuggingFace](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct)
+ Granite 4 is still in preview stages, for these runs we use Tiny Preview, which is an open source **7B Parameter Hybrid Mixture-of-Experts** Instruct Model [on HuggingFace](https://huggingface.co/ibm-granite/granite-4.0-tiny-preview)

Also applies to: 91-91, 150-151

🧹 Nitpick comments (3)

examples/docs/runtime_estimates.md (3)

150-151: Use hyphens in compound modifiers.

Compound adjectives before nouns should be hyphenated for grammatical correctness.

Apply this diff:

- Granite 3.3 is an open source **8B Parameter Large Language** Instruct model [on HuggingFace](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct)
- Granite 4 is still in preview stages, for these runs we use Tiny Preview, which is an open source **7B Parameter Hybrid Mixture-of-Experts** Instruct Model [on HuggingFace](https://huggingface.co/ibm-granite/granite-4.0-tiny-preview)
+ Granite 3.3 is an open source **8B-Parameter Large-Language** Instruct model [on HuggingFace](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct)
+ Granite 4 is still in preview stages, for these runs we use Tiny Preview, which is an open source **7B-Parameter Hybrid Mixture-of-Experts** Instruct Model [on HuggingFace](https://huggingface.co/ibm-granite/granite-4.0-tiny-preview)

181-181: Improve emphasis line wording and consider structured highlighting.

Line 181 uses bold emphasis for an important caveat but could be clearer. Additionally, "All of the" is unnecessarily verbose. Consider a blockquote or code block for better visual hierarchy, or simplify to normal emphasis:

- **All of the measured times are for a single trial only! They are NOT the average of multiple trials**
+ > **Note:** All measured times are for a single trial only, not the average of multiple trials.

Alternatively, if keeping emphasis:

- **All of the measured times are for a single trial only! They are NOT the average of multiple trials**
+ **Note:** All measured times are for a single trial only—not the average of multiple trials.

25-25: Replace weak intensifier "very large" with specific descriptor.

Given that you specify "tens of GB per epoch," the size is already clear. Either remove the intensifier or use a more specific adjective:

- This was mostly done to conserve disk space due to checkpoints being very large (tens of GB per epoch), which can cause DiskPressure on OpenShift
+ This was mostly done to conserve disk space due to checkpoints being substantial (tens of GB per epoch), which can cause DiskPressure on OpenShift

Or simply:

- This was mostly done to conserve disk space due to checkpoints being very large (tens of GB per epoch), which can cause DiskPressure on OpenShift
+ This was mostly done to conserve disk space since checkpoints are quite large—tens of GB per epoch—which can cause DiskPressure on OpenShift.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c65a094 and aee38d9.

📒 Files selected for processing (1)

examples/docs/runtime_estimates.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

examples/docs/runtime_estimates.md

[style] ~25-~25: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...rve disk space due to checkpoints being very large (tens of GB per epoch), which can cause...

(EN_WEAK_ADJECTIVE)

[grammar] ~150-~150: Use a hyphen to join words.
Context: ...3.3 vs Granite 4 Granite 3.3 is an open source 8B Parameter Large Language I...

(QB_NEW_EN_HYPHEN)

[grammar] ~151-~151: Use a hyphen to join words.
Context: ...ns we use Tiny Preview, which is an open source **7B Parameter Hybrid Mixture-of-...

(QB_NEW_EN_HYPHEN)

[style] ~181-~181: Consider removing “of” to be more concise
Context: ...to the full training cycle measures. **All of the measured times are for a single trial o...

(ALL_OF_THE)

🪛 markdownlint-cli2 (0.18.1)

examples/docs/runtime_estimates.md

9-9: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

10-10: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

14-14: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

16-16: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

17-17: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

18-18: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

21-21: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

22-22: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

24-24: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

25-25: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

57-57: Bare URL used

(MD034, no-bare-urls)

91-91: Bare URL used

(MD034, no-bare-urls)

91-91: Bare URL used

(MD034, no-bare-urls)

150-150: Bare URL used

(MD034, no-bare-urls)

151-151: Bare URL used

(MD034, no-bare-urls)

181-181: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

examples/docs/sft_datasets.png

examples/docs/osft_datasets.png

examples/docs/runtime_estimates.md

NikhilNayak-debug

Added a couple of comments, otherwise the PR looks good to me!

…OSFT testing

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

examples/docs/runtime_estimates.md (3)

59-60: Wrap bare URLs in markdown link syntax.

These lines contain bare URLs that should be wrapped in markdown links for proper rendering and accessibility.

Apply this diff:

-Granite 3.3 is an open source **8B Parameter Large Language** Instruct model https://huggingface.co/ibm-granite/granite-3.3-8b-instruct
-Granite 4 is still in preview stages, for these runs we use Tiny Preview, which is an open source **7B Parameter Hybrid Mixture-of-Experts** Instruct Model https://huggingface.co/ibm-granite/granite-4.0-tiny-preview 
+Granite 3.3 is an open source **8B Parameter Large Language** Instruct model [on HuggingFace](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct)
+Granite 4 is still in preview stages, for these runs we use Tiny Preview, which is an open source **7B Parameter Hybrid Mixture-of-Experts** Instruct Model [on HuggingFace](https://huggingface.co/ibm-granite/granite-4.0-tiny-preview)

113-113: Wrap bare URLs in markdown link syntax.

This line contains two bare URLs that should be wrapped in markdown links for proper rendering and accessibility.

Apply this diff:

-The two datasets used were Table-GPT (https://huggingface.co/datasets/LipengCS/Table-GPT) and Bespoke-Stratos-17k (https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k). Both datasets are using the training split. Each dataset was downloaded via Huggingface's datasets package, with the .jsonl file extracted for use in Training-Hub.
+The two datasets used were [Table-GPT](https://huggingface.co/datasets/LipengCS/Table-GPT) and [Bespoke-Stratos-17k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k). Both datasets are using the training split. Each dataset was downloaded via Huggingface's datasets package, with the .jsonl file extracted for use in Training-Hub.

88-88: Wrap bare URL in markdown link syntax.

This bare URL should be wrapped in markdown link for proper rendering and accessibility.

Apply this diff:

-The A100 and H100 both contain 80 GB of VRAM per GPUs, and both setups contain 8 GPUs. However, the H100 higher FLOPs and more cores than the A100. See Wikipedia: https://en.wikipedia.org/wiki/Hopper_(microarchitecture)#H100_accelerator_and_DGX_H100 
+The A100 and H100 both contain 80 GB of VRAM per GPUs, and both setups contain 8 GPUs. However, the H100 higher FLOPs and more cores than the A100. See [Wikipedia on H100 accelerator](https://en.wikipedia.org/wiki/Hopper_(microarchitecture)#H100_accelerator_and_DGX_H100)

🧹 Nitpick comments (1)

examples/docs/runtime_estimates.md (1)
263-263: Convert emphasis to proper heading syntax.

Line 263 uses bold emphasis (**...**) where a proper markdown heading (##) would be more appropriate for section structure.

Apply this diff:
-**All of the measured times are for a single trial only! They are NOT the average of multiple trials**
+## Note on Trial Count
+
+All of the measured times are for a single trial only! They are NOT the average of multiple trials.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aee38d9 and 4a6c77c.

📒 Files selected for processing (1)

examples/docs/runtime_estimates.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

examples/docs/runtime_estimates.md

[style] ~20-~20: For conciseness, consider replacing this expression with an adverb.
Context: ...details about this warm-up aren't known at the moment, but will be added when more informatio...

(AT_THE_MOMENT)

[style] ~27-~27: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...rve disk space due to checkpoints being very large (tens of GB per epoch), which can cause...

(EN_WEAK_ADJECTIVE)

[grammar] ~59-~59: Use a hyphen to join words.
Context: ...3.3 vs Granite 4 Granite 3.3 is an open source 8B Parameter Large Language I...

(QB_NEW_EN_HYPHEN)

[grammar] ~60-~60: Use a hyphen to join words.
Context: ...ns we use Tiny Preview, which is an open source **7B Parameter Hybrid Mixture-of-...

(QB_NEW_EN_HYPHEN)

[style] ~160-~160: This wording can make your sentence hard to follow. Try rephrasing for improved clarity.
Context: ...ens_per_gpuhave an impact on the time due to it effectively serving as a batch size, butmax_seq_len` prev...

(DUE_TO_BECAUSE)

[style] ~263-~263: Consider removing “of” to be more concise
Context: ...to the full training cycle measures. **All of the measured times are for a single trial o...

(ALL_OF_THE)

🪛 markdownlint-cli2 (0.18.1)

examples/docs/runtime_estimates.md

9-9: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

10-10: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

14-14: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

16-16: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

17-17: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

18-18: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

19-19: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

20-20: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)

23-23: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

24-24: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

26-26: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

27-27: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

59-59: Bare URL used

(MD034, no-bare-urls)

60-60: Bare URL used

(MD034, no-bare-urls)

88-88: Bare URL used

(MD034, no-bare-urls)

113-113: Bare URL used

(MD034, no-bare-urls)

113-113: Bare URL used

(MD034, no-bare-urls)

263-263: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

coderabbitai · 2025-11-13T22:31:04Z

examples/docs/runtime_estimates.md

+    - For SFT, `max_tokens_per_gpu=25000` and `max_seq_len=20000`
+    - For OSFT, `max_tokens_per_gpu=10000`, `max_seq_len=4096`, and `unfreeze_rank_ratio=0.3`


🛠️ Refactor suggestion | 🟠 Major

Fix markdown list indentation to use 2 spaces instead of 4.

Multiple nested list items use 4-space indentation but should use 2 spaces per markdown conventions for proper rendering and consistency.

Apply this diff to fix the indentation across all nested list items:

- The experiments were conducted by using the default settings provided by `sft_granite_example.py` and `osft_granite_example.py` - - For SFT, `max_tokens_per_gpu=25000` and `max_seq_len=20000` - - For OSFT, `max_tokens_per_gpu=10000`, `max_seq_len=4096`, and `unfreeze_rank_ratio=0.3` + - For SFT, `max_tokens_per_gpu=25000` and `max_seq_len=20000` + - For OSFT, `max_tokens_per_gpu=10000`, `max_seq_len=4096`, and `unfreeze_rank_ratio=0.3` - **Models**: Two models were tested, **Granite 3.3 8B**, and **Granite 4 Tiny Preview** (a Mixture-of-Experts model that also has 8B Parameters) - **Hardware**: Two different hardware configurations were tested, a server with **8x A100s**, and an Openshift cluster with **8x H100s**. - **Datasets**: Two datasets were tested, a simple dataset in Table-GPT and a much larger and longer dataset in Bespoke-Stratos-17k. - - Please note that both datasets were obtained by downloading the dataset from HuggingFace and then extracting the .jsonl file. + - Please note that both datasets were obtained by downloading the dataset from HuggingFace and then extracting the .jsonl file. - All experiments were run for the first full epoch two times, with the displayed time being the average of the two times. - - **Please be aware that time for later epochs may vary** - - On the A100 machine, the variation between the two runs was negligible, never more than 6 seconds. - - The variation is a bit larger on the H100 machine, especially during the first run of a pod (the first result was discarded and reran if it varied significantly) - - **Granite 4 tends to require some amount of warm-up before its first usage.** The shown times for Granite 4 are for runs where the warm-up does not occur. Typically, the warm-up adds about an additional 1 minute to the runtime. - - The reasons for and details about this warm-up aren't known at the moment, but will be added when more information is gathered. Please keep in mind Granite 4 is still in preview. + - **Please be aware that time for later epochs may vary** + - On the A100 machine, the variation between the two runs was negligible, never more than 6 seconds. + - The variation is a bit larger on the H100 machine, especially during the first run of a pod (the first result was discarded and reran if it varied significantly) + - **Granite 4 tends to require some amount of warm-up before its first usage.** The shown times for Granite 4 are for runs where the warm-up does not occur. Typically, the warm-up adds about an additional 1 minute to the runtime. + - The reasons for and details about this warm-up aren't known at the moment, but will be added when more information is gathered. Please keep in mind Granite 4 is still in preview. - The time measurement is calculated by using the timestamps logged during the training process in the above scripts - By default, OSFT makes use of Liger Kernels to improve memory usage and runtime. However, as of Nov 7th 2025, Liger Kernels currently don't have built-in support for Granite 4 - - As a result, the script was modified for allow Liger Kernels to be disabled for certain experiments - - The tables will be updated once support for Liger Kernels is added. + - As a result, the script was modified for allow Liger Kernels to be disabled for certain experiments + - The tables will be updated once support for Liger Kernels is added. - Many of these tests had the checkpointing hardcoded to be disabled in the script (set `checkpoint_at_epoch=False` and `accelerate_full_state_at_epoch=False`) - - This does not appear to impact runtime of the actual training loop - - This was mostly done to conserve disk space due to checkpoints being very large (tens of GB per epoch), which can cause DiskPressure on OpenShift + - This does not appear to impact runtime of the actual training loop + - This was mostly done to conserve disk space due to checkpoints being very large (tens of GB per epoch), which can cause DiskPressure on OpenShift

Also applies to: 14-14, 16-18, 21-22, 24-25

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

9-9: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

10-10: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

🤖 Prompt for AI Agents

In examples/docs/runtime_estimates.md around lines 9-10 (also apply to 14, 16-18, 21-22, 24-25), several nested markdown list items are indented with 4 spaces; change those nested list item indentations to 2 spaces each so the lists render correctly and consistently (update every listed line range to use 2-space indentation for nested items).

Mohib Azam and others added 23 commits September 15, 2025 12:02

First pass at memory estimator

31e59e9

Fixing bugs with the OSFT implementation

f3ca036

Merge branch 'Red-Hat-AI-Innovation-Team:main' into main

fcc9383

Cleanup prior to draft PR

7ae33de

Updating PR with updated thresholds

dba04d0

Restructing Memory estimator as a class

4590ead

Restructing Memory estimator as a class

9c24220

Fixing bug in OSFT

7d0e05c

Polished up the documentation and added verbosity feature for the PR

581fa5e

Notebook giving an example on how to use the memory estimator

4f9ebee

Addressing coderabbit comments

88ad308

Patching in a simpler estimator for OSFT, updating the notebook, hotf…

a6b7375

…ixing __init__ issues

Addressing coderabbit review

e0f2752

Hotfixing coderabbit issue

7cd94a9

Addressing Mustafa's comments on the readme, adjusting an typcheck fr…

c781b7c

…om coderabbit, some other documentation cleaning

Simplifying the linear mapping based on Nikhil's comment

c06e934

Resolving merge

eded2c0

Adding a draft of the runtime documentation

164727a

Merge pull request #1 from Red-Hat-AI-Innovation-Team/main

e5ef0d3

Update python build action for 3.14 compatibility (Red-Hat-AI-Innovation-Team#21)

Merge branch 'Red-Hat-AI-Innovation-Team:main' into main

91afec8

Removing unneeded files

815af17

Merge remote-tracking branch 'refs/remotes/origin/main'

6994031

Fleshing out the documentation and fixing the image link for the PR

7545ac0

mazam-lab added 2 commits November 6, 2025 09:25

Quick update prior to PSAP meeting

65ea1d1

Adding all graphs and additional details to shift to full PR

c65a094

mazam-lab marked this pull request as ready for review November 7, 2025 18:21

coderabbitai bot reviewed Nov 7, 2025

View reviewed changes

examples/docs/runtime_estimates.md Outdated Show resolved Hide resolved

examples/docs/runtime_estimates.md Outdated Show resolved Hide resolved

Quick fix to handle images not displaying properly

aee38d9

coderabbitai bot reviewed Nov 7, 2025

View reviewed changes

NikhilNayak-debug reviewed Nov 10, 2025

View reviewed changes

examples/docs/sft_datasets.png Outdated Show resolved Hide resolved

NikhilNayak-debug reviewed Nov 10, 2025

View reviewed changes

examples/docs/osft_datasets.png Outdated Show resolved Hide resolved

NikhilNayak-debug reviewed Nov 10, 2025

View reviewed changes

examples/docs/runtime_estimates.md Show resolved Hide resolved

NikhilNayak-debug approved these changes Nov 10, 2025

View reviewed changes

Adjusting readme based on PR comments, adding section on more robust …

4a6c77c

…OSFT testing

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding documentation showcasing the estimated runtime for various models and training setups #22

Adding documentation showcasing the estimated runtime for various models and training setups #22

Uh oh!

mazam-lab commented Nov 5, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 5, 2025 •

edited

Loading

Uh oh!

mazam-lab commented Nov 7, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NikhilNayak-debug left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		- For SFT, `max_tokens_per_gpu=25000` and `max_seq_len=20000`
		- For OSFT, `max_tokens_per_gpu=10000`, `max_seq_len=4096`, and `unfreeze_rank_ratio=0.3`

Adding documentation showcasing the estimated runtime for various models and training setups #22

Are you sure you want to change the base?

Adding documentation showcasing the estimated runtime for various models and training setups #22

Uh oh!

Conversation

mazam-lab commented Nov 5, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

mazam-lab commented Nov 7, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NikhilNayak-debug left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mazam-lab commented Nov 5, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 5, 2025 •

edited

Loading