-
Notifications
You must be signed in to change notification settings - Fork 18
Adding documentation showcasing the estimated runtime for various models and training setups #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ixing __init__ issues
…om coderabbit, some other documentation cleaning
Update python build action for 3.14 compatibility (Red-Hat-AI-Innovation-Team#21)
WalkthroughAdded a new documentation file providing wall-clock runtime measurements for model fine-tuning, including experiment notes, per‑epoch timings, comparative analyses across hardware and model versions, dataset preprocessing details, and references to graphs and sanity checks. (49 words) Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
All missing information has been, this is ready for a full PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (5)
examples/docs/runtime_estimates.md (5)
9-10: Fix unordered list indentation.Markdown linting expects 2-space indentation for nested list items, but these lines use 4-space indentation.
Apply this diff to use consistent 2-space indentation:
- The experiments were conducted by using the default settings provided by `sft_granite_example.py` and `osft_granite_example.py` - - For SFT, `max_tokens_per_gpu=25000` and `max_seq_len=20000` - - For OSFT, `max_tokens_per_gpu=10000`, `max_seq_len=4096`, and `unfreeze_rank_ratio=0.3` + - For SFT, `max_tokens_per_gpu=25000` and `max_seq_len=20000` + - For OSFT, `max_tokens_per_gpu=10000`, `max_seq_len=4096`, and `unfreeze_rank_ratio=0.3` - **Models**: Two models were tested, **Granite 3.3 8B**, and **Granite 4 Tiny Preview** (a Mixture-of-Experts model that also has 8B Parameters) - **Hardware**: Two different hardware configurations were tested, a server with **8x A100s**, and an Openshift cluster with **8x H100s**. - **Datasets**: Two datasets were tested, a simple dataset in Table-GPT and a much larger and longer dataset in Bespoke-Stratos-17k. - - Please note that both datasets were obtained by downloading the dataset from HuggingFace and then extracting the .jsonl file. + - Please note that both datasets were obtained by downloading the dataset from HuggingFace and then extracting the .jsonl file. - All experiments were run for the first full epoch two times, with the displayed time being the average of the two times. - - **Please be aware that time for later epochs may vary** - - On the A100 machine, the variation between the two runs was negligible, never more than 6 seconds. - - The variation is a bit larger on the H100 machine, especially during the first run of a pod (the first result was discarded and reran if it varied significantly) + - **Please be aware that time for later epochs may vary** + - On the A100 machine, the variation between the two runs was negligible, never more than 6 seconds. + - The variation is a bit larger on the H100 machine, especially during the first run of a pod (the first result was discarded and reran if it varied significantly) - The time measurement is calculated by using the timestamps logged during the training process in the above scripts - By default, OSFT makes use of Liger Kernels to improve memory usage and runtime. However, as of Nov 7th 2025, Liger Kernels currently don't have built-in support for Granite 4 - - As a result, the script was modified for allow Liger Kernels to be disabled for certain experiments - - The tables will be updated once support for Liger Kernels is added. + - As a result, the script was modified for allow Liger Kernels to be disabled for certain experiments + - The tables will be updated once support for Liger Kernels is added. - Many of these tests had the checkpointing hardcoded to be disabled in the script (set `checkpoint_at_epoch=False` and `accelerate_full_state_at_epoch=False`) - - This does not appear to impact runtime of the actual training loop - - This was mostly done to conserve disk space due to checkpoints being very large (tens of GB per epoch), which can cause DiskPressure on OpenShift + - This does not appear to impact runtime of the actual training loop + - This was mostly done to conserve disk space due to checkpoints being very large (tens of GB per epoch), which can cause DiskPressure on OpenShiftAlso applies to: 14-18, 21-25
113-119: Clarify table column headers for dataset/model breakdowns.The header "Granite 3.3 Bespoke" and "Granite 4 Bespoke" for columns 3–4 is misleading given that column 2 contains Table-GPT statistics. Since line 109 notes that Granite 3.3 and 4 differ negligibly for Table-GPT, the table structure should more clearly indicate which column represents which dataset.
Revise the table header to be clearer:
-| Stat | Table-GPT | Granite 3.3 Bespoke | Granite 4 Bespoke | +| Stat | Table-GPT | Bespoke (Granite 3.3) | Bespoke (Granite 4) |Or, if the differences matter, split into separate tables labeled "Table-GPT Stats" and "Bespoke Stats" to remove ambiguity.
150-151: Use hyphenated compound adjectives before nouns.Grammar: "open source" should be "open-source" when used as a compound adjective modifying a noun.
Apply this diff:
-Granite 3.3 is an open source **8B Parameter Large Language** Instruct model -Granite 4 is still in preview stages, for these runs we use Tiny Preview, which is an open source **7B Parameter Hybrid Mixture-of-Experts** Instruct Model +Granite 3.3 is an open-source **8B Parameter Large Language** Instruct model +Granite 4 is still in preview stages, for these runs we use Tiny Preview, which is an open-source **7B Parameter Hybrid Mixture-of-Experts** Instruct ModelAlso, "8B Parameter" and "7B Parameter" should be hyphenated:
8B-Parameterand7B-Parameter.-Granite 3.3 is an open-source **8B Parameter Large Language** Instruct model +Granite 3.3 is an open-source **8B-Parameter Large-Language** Instruct model -Granite 4 is still in preview stages, for these runs we use Tiny Preview, which is an open-source **7B Parameter Hybrid Mixture-of-Experts** Instruct Model +Granite 4 is still in preview stages, for these runs we use Tiny Preview, which is an open-source **7B-Parameter Hybrid Mixture-of-Experts** Instruct Model
181-181: Use a regular heading instead of bold text; improve text conciseness.Line 181 uses bold emphasis to introduce a note, but this should be a proper heading (###). Additionally, "All of the" should be shortened to "All" for clarity.
Apply this diff:
-**All of the measured times are for a single trial only! They are NOT the average of multiple trials** +### Note on Trial Counts + +All measured times are for a single trial only. They are NOT the average of multiple trials.
25-25: Replace vague intensifier with concrete descriptor."very large" is imprecise. Use a specific measurement or qualifier.
- - This was mostly done to conserve disk space due to checkpoints being very large (tens of GB per epoch), which can cause DiskPressure on OpenShift + - This was mostly done to conserve disk space due to checkpoints being extremely large (tens of GB per epoch), which can cause DiskPressure on OpenShiftAlternatively, if space is the key constraint, rephrase to:
+ - This was mostly done to conserve disk space; checkpoints are tens of GB per epoch and can cause DiskPressure on OpenShift
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (6)
examples/docs/osft_datasets.pngis excluded by!**/*.pngexamples/docs/osft_hardware.pngis excluded by!**/*.pngexamples/docs/osft_models.pngis excluded by!**/*.pngexamples/docs/sft_datasets.pngis excluded by!**/*.pngexamples/docs/sft_hardware.pngis excluded by!**/*.pngexamples/docs/sft_models.pngis excluded by!**/*.png
📒 Files selected for processing (1)
examples/docs/runtime_estimates.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
examples/docs/runtime_estimates.md
[style] ~25-~25: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...rve disk space due to checkpoints being very large (tens of GB per epoch), which can cause...
(EN_WEAK_ADJECTIVE)
[grammar] ~150-~150: Use a hyphen to join words.
Context: ...3.3 vs Granite 4 Granite 3.3 is an open source 8B Parameter Large Language I...
(QB_NEW_EN_HYPHEN)
[grammar] ~151-~151: Use a hyphen to join words.
Context: ...ns we use Tiny Preview, which is an open source **7B Parameter Hybrid Mixture-of-...
(QB_NEW_EN_HYPHEN)
[style] ~181-~181: Consider removing “of” to be more concise
Context: ...to the full training cycle measures. **All of the measured times are for a single trial o...
(ALL_OF_THE)
🪛 markdownlint-cli2 (0.18.1)
examples/docs/runtime_estimates.md
9-9: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
10-10: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
14-14: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
16-16: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
17-17: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
18-18: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
21-21: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
22-22: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
24-24: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
25-25: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
57-57: Bare URL used
(MD034, no-bare-urls)
69-69: Link and image reference definitions should be needed
Duplicate link or image reference definition: "logo"
(MD053, link-image-reference-definitions)
91-91: Bare URL used
(MD034, no-bare-urls)
91-91: Bare URL used
(MD034, no-bare-urls)
125-125: Link and image reference definitions should be needed
Duplicate link or image reference definition: "logo"
(MD053, link-image-reference-definitions)
129-129: Link and image reference definitions should be needed
Duplicate link or image reference definition: "logo"
(MD053, link-image-reference-definitions)
150-150: Bare URL used
(MD034, no-bare-urls)
151-151: Bare URL used
(MD034, no-bare-urls)
157-157: Link and image reference definitions should be needed
Duplicate link or image reference definition: "logo"
(MD053, link-image-reference-definitions)
161-161: Link and image reference definitions should be needed
Duplicate link or image reference definition: "logo"
(MD053, link-image-reference-definitions)
181-181: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🔇 Additional comments (1)
examples/docs/runtime_estimates.md (1)
63-70: Verify that referenced image files exist in the repository.The documentation references PNG files on GitHub (e.g.,
sft_hardware.png,osft_datasets.png). If these images are placeholders or not yet committed, the links will be broken for readers.Please confirm that the following image files exist on the main branch at the paths specified:
examples/docs/sft_hardware.pngexamples/docs/osft_hardware.pngexamples/docs/sft_datasets.pngexamples/docs/osft_datasets.pngexamples/docs/sft_models.pngexamples/docs/osft_models.pngIf these files are not yet available, consider adding a note (e.g., "Graphs coming soon") or temporarily removing the image sections to avoid broken links in documentation.
Also applies to: 123-130, 155-162
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
examples/docs/runtime_estimates.md (1)
57-57: Wrap bare URLs in markdown link syntaxBare URLs should be wrapped in markdown links for proper rendering and accessibility. This was flagged in a previous review and remains unaddressed.
Apply these diffs:
- The A100 and H100 both contain 80 GB of VRAM per GPUs, and both setups contain 8 GPUs. However, the H100 higher FLOPs and more cores than the A100. See Wikipedia: https://en.wikipedia.org/wiki/Hopper_(microarchitecture)#H100_accelerator_and_DGX_H100 + The A100 and H100 both contain 80 GB of VRAM per GPUs, and both setups contain 8 GPUs. However, the H100 higher FLOPs and more cores than the A100. See [Wikipedia on H100](https://en.wikipedia.org/wiki/Hopper_(microarchitecture)#H100_accelerator_and_DGX_H100)- The two datasets used were Table-GPT (https://huggingface.co/datasets/LipengCS/Table-GPT) and Bespoke-Stratos-17k (https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k). + The two datasets used were [Table-GPT](https://huggingface.co/datasets/LipengCS/Table-GPT) and [Bespoke-Stratos-17k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k).- Granite 3.3 is an open source **8B Parameter Large Language** Instruct model https://huggingface.co/ibm-granite/granite-3.3-8b-instruct - Granite 4 is still in preview stages, for these runs we use Tiny Preview, which is an open source **7B Parameter Hybrid Mixture-of-Experts** Instruct Model https://huggingface.co/ibm-granite/granite-4.0-tiny-preview + Granite 3.3 is an open source **8B Parameter Large Language** Instruct model [on HuggingFace](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct) + Granite 4 is still in preview stages, for these runs we use Tiny Preview, which is an open source **7B Parameter Hybrid Mixture-of-Experts** Instruct Model [on HuggingFace](https://huggingface.co/ibm-granite/granite-4.0-tiny-preview)Also applies to: 91-91, 150-151
🧹 Nitpick comments (3)
examples/docs/runtime_estimates.md (3)
150-151: Use hyphens in compound modifiers.Compound adjectives before nouns should be hyphenated for grammatical correctness.
Apply this diff:
- Granite 3.3 is an open source **8B Parameter Large Language** Instruct model [on HuggingFace](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct) - Granite 4 is still in preview stages, for these runs we use Tiny Preview, which is an open source **7B Parameter Hybrid Mixture-of-Experts** Instruct Model [on HuggingFace](https://huggingface.co/ibm-granite/granite-4.0-tiny-preview) + Granite 3.3 is an open source **8B-Parameter Large-Language** Instruct model [on HuggingFace](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct) + Granite 4 is still in preview stages, for these runs we use Tiny Preview, which is an open source **7B-Parameter Hybrid Mixture-of-Experts** Instruct Model [on HuggingFace](https://huggingface.co/ibm-granite/granite-4.0-tiny-preview)
181-181: Improve emphasis line wording and consider structured highlighting.Line 181 uses bold emphasis for an important caveat but could be clearer. Additionally, "All of the" is unnecessarily verbose. Consider a blockquote or code block for better visual hierarchy, or simplify to normal emphasis:
- **All of the measured times are for a single trial only! They are NOT the average of multiple trials** + > **Note:** All measured times are for a single trial only, not the average of multiple trials.Alternatively, if keeping emphasis:
- **All of the measured times are for a single trial only! They are NOT the average of multiple trials** + **Note:** All measured times are for a single trial only—not the average of multiple trials.
25-25: Replace weak intensifier "very large" with specific descriptor.Given that you specify "tens of GB per epoch," the size is already clear. Either remove the intensifier or use a more specific adjective:
- This was mostly done to conserve disk space due to checkpoints being very large (tens of GB per epoch), which can cause DiskPressure on OpenShift + This was mostly done to conserve disk space due to checkpoints being substantial (tens of GB per epoch), which can cause DiskPressure on OpenShiftOr simply:
- This was mostly done to conserve disk space due to checkpoints being very large (tens of GB per epoch), which can cause DiskPressure on OpenShift + This was mostly done to conserve disk space since checkpoints are quite large—tens of GB per epoch—which can cause DiskPressure on OpenShift.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/docs/runtime_estimates.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
examples/docs/runtime_estimates.md
[style] ~25-~25: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...rve disk space due to checkpoints being very large (tens of GB per epoch), which can cause...
(EN_WEAK_ADJECTIVE)
[grammar] ~150-~150: Use a hyphen to join words.
Context: ...3.3 vs Granite 4 Granite 3.3 is an open source 8B Parameter Large Language I...
(QB_NEW_EN_HYPHEN)
[grammar] ~151-~151: Use a hyphen to join words.
Context: ...ns we use Tiny Preview, which is an open source **7B Parameter Hybrid Mixture-of-...
(QB_NEW_EN_HYPHEN)
[style] ~181-~181: Consider removing “of” to be more concise
Context: ...to the full training cycle measures. **All of the measured times are for a single trial o...
(ALL_OF_THE)
🪛 markdownlint-cli2 (0.18.1)
examples/docs/runtime_estimates.md
9-9: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
10-10: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
14-14: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
16-16: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
17-17: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
18-18: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
21-21: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
22-22: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
24-24: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
25-25: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
57-57: Bare URL used
(MD034, no-bare-urls)
91-91: Bare URL used
(MD034, no-bare-urls)
91-91: Bare URL used
(MD034, no-bare-urls)
150-150: Bare URL used
(MD034, no-bare-urls)
151-151: Bare URL used
(MD034, no-bare-urls)
181-181: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
NikhilNayak-debug
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a couple of comments, otherwise the PR looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
examples/docs/runtime_estimates.md (3)
59-60: Wrap bare URLs in markdown link syntax.These lines contain bare URLs that should be wrapped in markdown links for proper rendering and accessibility.
Apply this diff:
-Granite 3.3 is an open source **8B Parameter Large Language** Instruct model https://huggingface.co/ibm-granite/granite-3.3-8b-instruct -Granite 4 is still in preview stages, for these runs we use Tiny Preview, which is an open source **7B Parameter Hybrid Mixture-of-Experts** Instruct Model https://huggingface.co/ibm-granite/granite-4.0-tiny-preview +Granite 3.3 is an open source **8B Parameter Large Language** Instruct model [on HuggingFace](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct) +Granite 4 is still in preview stages, for these runs we use Tiny Preview, which is an open source **7B Parameter Hybrid Mixture-of-Experts** Instruct Model [on HuggingFace](https://huggingface.co/ibm-granite/granite-4.0-tiny-preview)
113-113: Wrap bare URLs in markdown link syntax.This line contains two bare URLs that should be wrapped in markdown links for proper rendering and accessibility.
Apply this diff:
-The two datasets used were Table-GPT (https://huggingface.co/datasets/LipengCS/Table-GPT) and Bespoke-Stratos-17k (https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k). Both datasets are using the training split. Each dataset was downloaded via Huggingface's datasets package, with the .jsonl file extracted for use in Training-Hub. +The two datasets used were [Table-GPT](https://huggingface.co/datasets/LipengCS/Table-GPT) and [Bespoke-Stratos-17k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k). Both datasets are using the training split. Each dataset was downloaded via Huggingface's datasets package, with the .jsonl file extracted for use in Training-Hub.
88-88: Wrap bare URL in markdown link syntax.This bare URL should be wrapped in markdown link for proper rendering and accessibility.
Apply this diff:
-The A100 and H100 both contain 80 GB of VRAM per GPUs, and both setups contain 8 GPUs. However, the H100 higher FLOPs and more cores than the A100. See Wikipedia: https://en.wikipedia.org/wiki/Hopper_(microarchitecture)#H100_accelerator_and_DGX_H100 +The A100 and H100 both contain 80 GB of VRAM per GPUs, and both setups contain 8 GPUs. However, the H100 higher FLOPs and more cores than the A100. See [Wikipedia on H100 accelerator](https://en.wikipedia.org/wiki/Hopper_(microarchitecture)#H100_accelerator_and_DGX_H100)
🧹 Nitpick comments (1)
examples/docs/runtime_estimates.md (1)
263-263: Convert emphasis to proper heading syntax.Line 263 uses bold emphasis (
**...**) where a proper markdown heading (##) would be more appropriate for section structure.Apply this diff:
-**All of the measured times are for a single trial only! They are NOT the average of multiple trials** +## Note on Trial Count + +All of the measured times are for a single trial only! They are NOT the average of multiple trials.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/docs/runtime_estimates.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
examples/docs/runtime_estimates.md
[style] ~20-~20: For conciseness, consider replacing this expression with an adverb.
Context: ...details about this warm-up aren't known at the moment, but will be added when more informatio...
(AT_THE_MOMENT)
[style] ~27-~27: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...rve disk space due to checkpoints being very large (tens of GB per epoch), which can cause...
(EN_WEAK_ADJECTIVE)
[grammar] ~59-~59: Use a hyphen to join words.
Context: ...3.3 vs Granite 4 Granite 3.3 is an open source 8B Parameter Large Language I...
(QB_NEW_EN_HYPHEN)
[grammar] ~60-~60: Use a hyphen to join words.
Context: ...ns we use Tiny Preview, which is an open source **7B Parameter Hybrid Mixture-of-...
(QB_NEW_EN_HYPHEN)
[style] ~160-~160: This wording can make your sentence hard to follow. Try rephrasing for improved clarity.
Context: ...ens_per_gpuhave an impact on the time due to it effectively serving as a batch size, butmax_seq_len` prev...
(DUE_TO_BECAUSE)
[style] ~263-~263: Consider removing “of” to be more concise
Context: ...to the full training cycle measures. **All of the measured times are for a single trial o...
(ALL_OF_THE)
🪛 markdownlint-cli2 (0.18.1)
examples/docs/runtime_estimates.md
9-9: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
10-10: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
14-14: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
16-16: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
17-17: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
18-18: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
19-19: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
20-20: Unordered list indentation
Expected: 4; Actual: 8
(MD007, ul-indent)
23-23: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
24-24: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
26-26: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
27-27: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
59-59: Bare URL used
(MD034, no-bare-urls)
60-60: Bare URL used
(MD034, no-bare-urls)
88-88: Bare URL used
(MD034, no-bare-urls)
113-113: Bare URL used
(MD034, no-bare-urls)
113-113: Bare URL used
(MD034, no-bare-urls)
263-263: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
| - For SFT, `max_tokens_per_gpu=25000` and `max_seq_len=20000` | ||
| - For OSFT, `max_tokens_per_gpu=10000`, `max_seq_len=4096`, and `unfreeze_rank_ratio=0.3` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Fix markdown list indentation to use 2 spaces instead of 4.
Multiple nested list items use 4-space indentation but should use 2 spaces per markdown conventions for proper rendering and consistency.
Apply this diff to fix the indentation across all nested list items:
- The experiments were conducted by using the default settings provided by `sft_granite_example.py` and `osft_granite_example.py`
- - For SFT, `max_tokens_per_gpu=25000` and `max_seq_len=20000`
- - For OSFT, `max_tokens_per_gpu=10000`, `max_seq_len=4096`, and `unfreeze_rank_ratio=0.3`
+ - For SFT, `max_tokens_per_gpu=25000` and `max_seq_len=20000`
+ - For OSFT, `max_tokens_per_gpu=10000`, `max_seq_len=4096`, and `unfreeze_rank_ratio=0.3`
- **Models**: Two models were tested, **Granite 3.3 8B**, and **Granite 4 Tiny Preview** (a Mixture-of-Experts model that also has 8B Parameters)
- **Hardware**: Two different hardware configurations were tested, a server with **8x A100s**, and an Openshift cluster with **8x H100s**.
- **Datasets**: Two datasets were tested, a simple dataset in Table-GPT and a much larger and longer dataset in Bespoke-Stratos-17k.
- - Please note that both datasets were obtained by downloading the dataset from HuggingFace and then extracting the .jsonl file.
+ - Please note that both datasets were obtained by downloading the dataset from HuggingFace and then extracting the .jsonl file.
- All experiments were run for the first full epoch two times, with the displayed time being the average of the two times.
- - **Please be aware that time for later epochs may vary**
- - On the A100 machine, the variation between the two runs was negligible, never more than 6 seconds.
- - The variation is a bit larger on the H100 machine, especially during the first run of a pod (the first result was discarded and reran if it varied significantly)
- - **Granite 4 tends to require some amount of warm-up before its first usage.** The shown times for Granite 4 are for runs where the warm-up does not occur. Typically, the warm-up adds about an additional 1 minute to the runtime.
- - The reasons for and details about this warm-up aren't known at the moment, but will be added when more information is gathered. Please keep in mind Granite 4 is still in preview.
+ - **Please be aware that time for later epochs may vary**
+ - On the A100 machine, the variation between the two runs was negligible, never more than 6 seconds.
+ - The variation is a bit larger on the H100 machine, especially during the first run of a pod (the first result was discarded and reran if it varied significantly)
+ - **Granite 4 tends to require some amount of warm-up before its first usage.** The shown times for Granite 4 are for runs where the warm-up does not occur. Typically, the warm-up adds about an additional 1 minute to the runtime.
+ - The reasons for and details about this warm-up aren't known at the moment, but will be added when more information is gathered. Please keep in mind Granite 4 is still in preview.
- The time measurement is calculated by using the timestamps logged during the training process in the above scripts
- By default, OSFT makes use of Liger Kernels to improve memory usage and runtime. However, as of Nov 7th 2025, Liger Kernels currently don't have built-in support for Granite 4
- - As a result, the script was modified for allow Liger Kernels to be disabled for certain experiments
- - The tables will be updated once support for Liger Kernels is added.
+ - As a result, the script was modified for allow Liger Kernels to be disabled for certain experiments
+ - The tables will be updated once support for Liger Kernels is added.
- Many of these tests had the checkpointing hardcoded to be disabled in the script (set `checkpoint_at_epoch=False` and `accelerate_full_state_at_epoch=False`)
- - This does not appear to impact runtime of the actual training loop
- - This was mostly done to conserve disk space due to checkpoints being very large (tens of GB per epoch), which can cause DiskPressure on OpenShift
+ - This does not appear to impact runtime of the actual training loop
+ - This was mostly done to conserve disk space due to checkpoints being very large (tens of GB per epoch), which can cause DiskPressure on OpenShiftAlso applies to: 14-14, 16-18, 21-22, 24-25
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
9-9: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
10-10: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
🤖 Prompt for AI Agents
In examples/docs/runtime_estimates.md around lines 9-10 (also apply to 14,
16-18, 21-22, 24-25), several nested markdown list items are indented with 4
spaces; change those nested list item indentations to 2 spaces each so the lists
render correctly and consistently (update every listed line range to use 2-space
indentation for nested items).
Still missing a few values and some additional plots.
Would like some feedback on the general direction of how the plots and tables are going to look as well as information on how to properly
(note that all the changes up to October 14th should already be reflected in this repo via the squashed commit that occurred in my last PR)
Summary by CodeRabbit