Skip to content

[lora] enforce_eager=true slows down generation time dramatically with LoRA#665

Merged
SumanthRH merged 7 commits intoNovaSky-AI:mainfrom
devpatelio:devpatel/skyrl-issue-67
Nov 17, 2025
Merged

[lora] enforce_eager=true slows down generation time dramatically with LoRA#665
SumanthRH merged 7 commits intoNovaSky-AI:mainfrom
devpatelio:devpatel/skyrl-issue-67

Conversation

@devpatelio
Copy link
Collaborator

Unfortunately, this seems to be a vLLM issue that has been widely reported and not addressed. I've provided some additional vllm configuration flags (fully sharded lora) and double checked that max_lora_rank is equal to the input lora rank which was also recorded as a potential cause.

For now, I've implemented a bandaid solution where we always set enforce_eager=false for LoRA runs to prevent slowdowns in all training runs with a warning. This is in line with the vLLM suggested fixes for the generator.

See vllm-project/vllm#13204 and vllm-project/vllm#9452

@devpatelio
Copy link
Collaborator Author

/gemini review

gemini-code-assist[bot]

This comment was marked as outdated.

gemini-code-assist[bot]

This comment was marked as outdated.

@devpatelio
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a workaround for a known performance issue in vLLM where using LoRA with enforce_eager=true causes significant slowdowns. The change automatically disables enforce_eager for LoRA runs when using the vLLM backend and issues a warning, which is a sensible approach. The implementation is clean and also correctly plumbs through the new fully_sharded_loras configuration option. My feedback includes a minor suggestion to improve the robustness of the backend check.

engine_kwargs["fully_sharded_loras"] = cfg.generator.fully_sharded_loras

# TODO(devpatel): Bandaid solution, replace this once we have a better solution for LoRA performance degradation on the vLLM side
if cfg.generator.enforce_eager and cfg.generator.backend == "vllm":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For improved robustness, it's a good practice to make the backend check case-insensitive. The configuration value for backend could potentially be provided with different casings (e.g., 'vllm', 'VLLM'). Using .lower() will ensure this check works as expected in all cases.

Suggested change
if cfg.generator.enforce_eager and cfg.generator.backend == "vllm":
if cfg.generator.enforce_eager and cfg.generator.backend.lower() == "vllm":

@SumanthRH SumanthRH merged commit 803155e into NovaSky-AI:main Nov 17, 2025
3 checks passed
li-boxuan pushed a commit to li-boxuan/SkyRL that referenced this pull request Nov 23, 2025
…matically with LoRA (NovaSky-AI#665)

Unfortunately, this seems to be a vLLM issue that has been widely
reported and not addressed. I've provided some additional vllm
configuration flags (fully sharded lora) and double checked that
max_lora_rank is equal to the input lora rank which was also recorded as
a potential cause.

For now, I've implemented a bandaid solution where we always set
enforce_eager=false for LoRA runs to prevent slowdowns in all training
runs with a warning. This is in line with the vLLM suggested fixes for
the generator.

See vllm-project/vllm#13204 and
vllm-project/vllm#9452
dzorlu pushed a commit to fleet-ai/SkyRL that referenced this pull request Feb 4, 2026
…matically with LoRA (NovaSky-AI#665)

Unfortunately, this seems to be a vLLM issue that has been widely
reported and not addressed. I've provided some additional vllm
configuration flags (fully sharded lora) and double checked that
max_lora_rank is equal to the input lora rank which was also recorded as
a potential cause.

For now, I've implemented a bandaid solution where we always set
enforce_eager=false for LoRA runs to prevent slowdowns in all training
runs with a warning. This is in line with the vLLM suggested fixes for
the generator.

See vllm-project/vllm#13204 and
vllm-project/vllm#9452
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants