-
Notifications
You must be signed in to change notification settings - Fork 18
Add Granite 4 SFT example #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Granite 4 SFT example #20
Conversation
WalkthroughAdds a new SFT example for Granite 4.0, updates the examples README, and refactors existing Granite example scripts to use module-level Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant Trainer
participant Checkpoint
participant Interpolator
User->>CLI: run `sft_granite4_example.py --args`
CLI->>CLI: parse args & apply `example_*` defaults
CLI->>Trainer: invoke `sft(...)` with config
Trainer->>Checkpoint: emit checkpoints (hf_format/samples_*)
Trainer-->>CLI: training finished
CLI->>Checkpoint: call `find_most_recent_checkpoint(output_dir)`
Checkpoint-->>CLI: return checkpoint_path
alt interpolation_weight > 0
CLI->>Interpolator: `interpolate_models(base, checkpoint, weight)`
Interpolator-->>CLI: return interpolated model path
end
CLI->>User: print duration and result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
examples/scripts/sft_granite4_example.py (1)
178-209: Drop the unusedresultbinding
resultis never read, so the assignment only triggers lint noise. Callingsftdirectly keeps the control flow identical while eliminating the warning.- result = sft( + sft( # Model and data model_path=args.model_path, data_path=args.data_path, ckpt_output_dir=args.ckpt_output_dir, @@ # Additional parameters to the backend **kwargs )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
examples/README.md(1 hunks)examples/scripts/osft_granite_example.py(5 hunks)examples/scripts/sft_granite4_example.py(1 hunks)examples/scripts/sft_granite_example.py(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
examples/scripts/sft_granite4_example.py (3)
src/training_hub/algorithms/sft.py (1)
sft(169-248)examples/scripts/sft_granite_example.py (2)
find_most_recent_checkpoint(70-93)main(96-207)examples/scripts/interpolator.py (1)
interpolate_models(19-78)
🪛 Ruff (0.14.3)
examples/scripts/sft_granite4_example.py
1-1: Shebang is present but file is not executable
(EXE001)
119-119: Avoid specifying long messages outside the exception class
(TRY003)
178-178: Local variable result is assigned to but never used
Remove assignment to unused variable result
(F841)
232-232: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (1)
examples/README.md (1)
31-31: Granite 4.0 entry is clearThanks for surfacing the new Granite 4.0 SFT example so users can find it quickly.
Maxusmusti
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one small nit, but otherwise looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
examples/scripts/sft_granite4_example.py (2)
93-96: Consider moving timestamp generation intomain().The timestamp is captured at module import time rather than when
main()executes, which could cause confusion if the module is imported but training is started later.Move the timestamp and
data_output_dirgeneration intomain():-timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") -full_experiment_name = f"{experiment_name}_{timestamp}" - -data_output_dir=f"data/{full_experiment_name}" # Directory for processed data -# data_output_dir=f"/dev/shm/data/{full_experiment_name}" # Directory for processed data (RAM disk for speed)Then in
main()after parsing args:timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") full_experiment_name = f"{experiment_name}_{timestamp}" data_output_dir = f"data/{full_experiment_name}"
100-124: Consider extracting to a shared utility module.This function is duplicated from
osft_continual_learning_example.py. To follow DRY principles, consider extracting it to a shared utilities module that both scripts can import.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/scripts/sft_granite4_example.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
examples/scripts/sft_granite4_example.py (2)
src/training_hub/algorithms/sft.py (1)
sft(169-248)examples/scripts/interpolator.py (1)
interpolate_models(19-78)
🪛 Ruff (0.14.3)
examples/scripts/sft_granite4_example.py
1-1: Shebang is present but file is not executable
(EXE001)
119-119: Avoid specifying long messages outside the exception class
(TRY003)
232-232: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (4)
examples/scripts/sft_granite4_example.py (4)
127-172: LGTM!The CLI argument parsing and configuration validation are well-structured. The soft validation approach (printing tips instead of hard enforcement) gives users flexibility while providing helpful guidance.
194-194: Good choice for checkpoint strategy.Setting
save_samples=0disables sample-based checkpointing and relies on epoch-based checkpoints only, which addresses the storage concern mentioned in previous reviews for large models like Granite 4 Small.
211-231: LGTM!The post-training reporting and conditional interpolation logic are well-implemented. The condition on line 223 correctly excludes edge cases where interpolation would be trivial (weights of 0.0 or 1.0).
232-241: Broad exception handling is acceptable here.While static analysis warns about the broad
Exceptioncatch, this is appropriate for a top-level CLI script where you want to provide user-friendly error messages for any failure. The error handling correctly reports duration and provides helpful troubleshooting tips.
This PR is for adding an SFT example for Granite 4.0 models. It also applies minor non-functional updates to the existing SFT/OSFT examples for Granite 3.3. The reason for not adding an OSFT example for Granite 4.0 models is that Granite-4.0-H-Small (32B total / 9B active) is too large for a single node training even if the cpu offloading is enabled.
Summary by CodeRabbit
New Features
Documentation
Refactor
Behavior