[GKD] Use vllm for the student model #3475

kashif · 2025-05-21T13:21:49Z

What does this PR do?

Adds an option to use vLLM for the teacher model

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

SalmanMohammadi · 2025-05-26T17:00:13Z

trl/trainer/gkd_config.py

+        teacher_vllm_mode (`str`, *optional*, defaults to `"server"`):
+            Mode for teacher vLLM integration. Either `"server"` (connect to a running TRL vLLM server) or
+            `"colocate"` (run vLLM in the same process).
+        teacher_vllm_server_host (`str`, *optional*, defaults to `"0.0.0.0"`):


I wonder if it's worth having separate vLLM arg configs to reuse across the GRPO/GKD trainers?

SalmanMohammadi · 2025-06-03T17:28:06Z

trl/trainer/gkd_trainer.py

+            else:
+                raise ValueError(f"Unknown student_vllm_mode: {self.student_vllm_mode}")
+            self.student_vllm_guided_decoding_regex = args.student_vllm_guided_decoding_regex
+            self.student_vllm_sync_frequency = args.student_vllm_sync_frequency


Do you have a feel for the impact of generating with a stale student policy?

trl/trainer/gkd_trainer.py

HuggingFaceDocBuilderDev · 2025-06-26T13:29:32Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copilot

Pull Request Overview

Adds support for using vLLM for the student model’s on-policy generation in GKDTrainer.

Introduces new student_use_vllm flags and parameters in GKDConfig, with validation.
Extends GKDTrainer to initialize vLLM in server or colocate mode, generate completions via vLLM, and sync weights.
Updates documentation with an “Accelerated Generation with vLLM” section and refines generalized_jsd_loss.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
trl/trainer/gkd_config.py	Added `student_use_vllm*` fields, metadata, and `max_new_tokens` check
trl/trainer/gkd_trainer.py	Integrated vLLM setup, generation, and parameter sync logic
docs/source/gkd_trainer.md	Documented vLLM server vs. co-locate modes and usage guidance

Comments suppressed due to low confidence (2)

trl/trainer/gkd_trainer.py:354

The new vLLM-based generation path is complex and critical but has no accompanying tests. Consider adding unit or integration tests covering both 'server' and 'colocate' modes, sync timing, and error branches (e.g., unknown mode).

    def _generate_on_policy_outputs_student_vllm(self, inputs, generation_config, pad_token_id=None):

trl/trainer/gkd_config.py:133

It may be helpful to validate student_vllm_mode in __post_init__ so only 'server' or 'colocate' are accepted, giving users early feedback on invalid values.

    student_vllm_mode: str = field(

Copilot · 2025-06-26T15:26:52Z

trl/trainer/gkd_trainer.py

+                        )
+
+                    # Create subgroups of ranks for TP
+                    self.student_tp_group, _ = torch.distributed.new_subgroups_by_enumeration(


new_subgroups_by_enumeration returns a list of ProcessGroups and a default group. Assigning the whole list to student_tp_group means all_gather_object later gets a list instead of a single ProcessGroup. You should pick the subgroup for the current rank (e.g., by matching torch.distributed.get_rank() against the ranks in each subgroup) before using it.

Suggested change

self.student_tp_group, _ = torch.distributed.new_subgroups_by_enumeration(

subgroups, _ = torch.distributed.new_subgroups_by_enumeration(

Copilot · 2025-06-26T15:26:52Z

trl/trainer/gkd_trainer.py

@@ -145,6 +158,70 @@ def __init__(
        ):
            self.generation_config.eos_token_id = self.model.generation_config.eos_token_id

+        # vLLM setup for student model if enabled
+        self.student_use_vllm = args.student_use_vllm
+        if self.student_use_vllm:


self.student_vllm_client is only set on the main process for server mode, so other ranks won't have this attribute. To avoid potential AttributeError during error handling or shutdown, initialize student_vllm_client=None unconditionally before the is_main_process block.

…d-vllm

initial

c35c5df

kashif marked this pull request as draft May 21, 2025 13:21

kashif added 5 commits May 21, 2025 18:43

update imports

f665c33

Merge branch 'main' into gkd-vllm

d09b36a

Merge branch 'main' into gkd-vllm

788a4a4

helper to generate from model

9a969d9

need the model name_or_path for vllm

f37f0ad

SalmanMohammadi reviewed May 26, 2025

View reviewed changes

kashif added 5 commits May 28, 2025 14:43

remove vllm for teacher

55ec89b

Merge branch 'main' into gkd-vllm

0c86e6c

add back _move_student_model_to_vllm

382f922

sync after student_vllm_sync_frequency

4835a33

add doc about vllm

9427ac3

SalmanMohammadi reviewed Jun 3, 2025

View reviewed changes

trl/trainer/gkd_trainer.py Outdated Show resolved Hide resolved

kashif added 2 commits June 4, 2025 10:33

fix collocation based sampling

0f50383

fix doc

45c39c0

kashif changed the title ~~[GKD] Use vllm for the teacher model~~ [GKD] Use vllm for the student model Jun 4, 2025

kashif added 5 commits June 5, 2025 11:05

Merge branch 'main' into gkd-vllm

61fdb52

do not fail silently

37a5fcd

use is_peft_model

d165796

set the teacher model's embedding size to that of student

98eaea9

Merge branch 'main' into gkd-vllm

a8c9f82

kashif marked this pull request as ready for review June 23, 2025 10:58

kashif requested a review from Copilot June 23, 2025 11:03

This comment was marked as outdated.

Sign in to view

kashif added 2 commits June 26, 2025 13:24

fix deepspeed 3 issue

5229d81

remove unused

f1f41e3

revert back

88a064f

kashif requested a review from Copilot June 26, 2025 15:25

Merge branch 'main' into gkd-vllm

e0fb8e6

Copilot AI reviewed Jun 26, 2025

View reviewed changes

kashif added 4 commits June 26, 2025 16:23

use callback to sync weights

053edb7

Merge branch 'gkd-vllm' of https://github.com/huggingface/trl into gk…

2816331

…d-vllm

Merge branch 'main' into gkd-vllm

0e319cb

Merge branch 'main' into gkd-vllm

db977f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GKD] Use vllm for the student model #3475

[GKD] Use vllm for the student model #3475

Uh oh!

kashif commented May 21, 2025

Uh oh!

SalmanMohammadi May 26, 2025

Uh oh!

SalmanMohammadi Jun 3, 2025

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

HuggingFaceDocBuilderDev commented Jun 26, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jun 26, 2025

Uh oh!

Copilot AI Jun 26, 2025

Uh oh!

Uh oh!

	self.student_tp_group, _ = torch.distributed.new_subgroups_by_enumeration(
	subgroups, _ = torch.distributed.new_subgroups_by_enumeration(

[GKD] Use vllm for the student model #3475

Are you sure you want to change the base?

[GKD] Use vllm for the student model #3475

Uh oh!

Conversation

kashif commented May 21, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

SalmanMohammadi May 26, 2025

Choose a reason for hiding this comment

Uh oh!

SalmanMohammadi Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

HuggingFaceDocBuilderDev commented Jun 26, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!