-
Notifications
You must be signed in to change notification settings - Fork 253
Single-controller LoRA RL fine-tuning with vLLM #735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @gursimar, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request delivers a practical and verified example for fine-tuning models using LoRA with a single controller and the vLLM inference engine. It serves as a blueprint for users looking to implement GRPO workflows with these specific technologies, providing both the Python script and the necessary YAML configuration to get started. The primary goal is to expand the existing LoRA + vLLM capabilities with a concrete, runnable demonstration. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new example for single-controller LoRA fine-tuning with the vLLM backend. The changes include a Python script for the training workflow and a corresponding YAML configuration file. The code is well-structured for an example script. My review includes a couple of suggestions for the Python script to improve maintainability by removing a magic number and to add a placeholder for an evaluation step, which seems intended by the configuration but is currently missing.
12d9725 to
fd4fd16
Compare
garrett4wade
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the implementation looks great, I'd like to still confirm the details about learning performance.
The previous SPMD LoRA code has an unresolved bug that if multiple infernece engines submit rollout requests concurrently, the learning performance will significantly drop. As a workaround, we only submit requests on rank 0 (code). Only through this way the learning curve can basically match full-parameter tuning.
I wonder whether the bug still exists in the single controller mode. Can you provide learning curves comparing this new script with the default SPMD, full-parameter tuning script? Hopefully there is no performance drop any more.
| with stats_tracker.record_timing("save"): | ||
| saver.save(actor, epoch, step, global_step, tokenizer=tokenizer) | ||
|
|
||
| with stats_tracker.record_timing("checkpoint_for_recover"): | ||
| recover_handler.dump( | ||
| actor, | ||
| step_info, | ||
| saver, | ||
| evaluator, | ||
| stats_logger, | ||
| train_dataloader, | ||
| tokenizer=tokenizer, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The single-controller training script has been slightly changed. It now has an additional clear_batch call. Please refer to the latest script for details.
FYI we are working to merging the scripts into trainers now.
| if config.rollout.max_head_offpolicyness > 0: | ||
| batch = actor.prepare_batch( | ||
| train_dataloader, | ||
| workflow="areal.workflow.rlvr.RLVRWorkflow", | ||
| workflow_kwargs=workflow_kwargs, | ||
| ) | ||
| else: | ||
| batch = actor.rollout_batch( | ||
| next(data_generator), | ||
| workflow="areal.workflow.rlvr.RLVRWorkflow", | ||
| workflow_kwargs=workflow_kwargs, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should only use prepare_batch. Please check the latest script in the main branch.
fd4fd16 to
90b7da1
Compare
Description
This PR adds working, tested examples for running single-controller LoRA training with the vLLM backend.
It builds on the existing LoRA + vLLM support (RFC #609) and demonstrates how to configure and launch a single-controller GRPO workflow.
What’s included
Files changed
Kept files in the examples/lora folder on purpose as IMO, all lora exmaples should be under this forder only.
examples/lora/gsm8k_grpo_vllm_single_controller.py— single-controller GRPO LoRA exampleexamples/lora/gsm8k_grpo_vllm_single_controller.yaml— config for vLLM backendRunning instructions
Testing
Type of Change
Checklist
/gemini review)Need help? Check the Contributing Guide or ask in
GitHub Discussions!