Tighten compilation cache invariants around eagle #17662

zou3519 · 2025-05-05T15:09:55Z

I'm recording down my understanding of how eagle and the compilation cache works after discussing
#17211 with @luyuzhe111 and @WoosukKwon.

In the future we likely will have a situation where we want to torch.compile multiple pieces of code (e.g. decoder and encoder separately) and then we'll need to refactor the system to support it (each compiled region needs its own cache directory with its own hash) But until then the current design seems fine.

github-actions · 2025-05-05T15:10:03Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

WoosukKwon · 2025-05-08T16:32:51Z

vllm/compilation/backends.py

+            # calls in a single model, please open an issue and let's discuss.
+            speculative_config = self.vllm_config.speculative_config
+            assert speculative_config is not None
+            assert speculative_config.method in ("eagle", "eagle3")


nit:

Suggested change

assert speculative_config.method in ("eagle", "eagle3")

assert speculative_config.use_eagle()

WoosukKwon · 2025-05-08T16:38:28Z

vllm/compilation/backends.py

+            # The eagle head does not need its own hash; we assume
+            # the hash of the original model entirely determines the config of
+            # the eagle head.


One small concern is that the eagle3 head often has different hidden size than the original model.
For example, the hidden size of Llama 3.3 70B is 8192 while the hidden size of its eagle3 head (from the eagle3 author) is 6144 (https://huggingface.co/yuhuili/EAGLE3-LLaMA3.3-Instruct-70B)
So, technically an eagle3 head can define its own hidden size.

There's only one public eagle3 head per model, so this assumption works for those public heads. I'm a little bit concerned this might not be the case for internal models/heads.

What do you mean by "internal models/heads"? Internal to vLLM or Meta or something else?

@zou3519 Internal to Meta or other companies.

@zou3519 I don't mean to block this PR. I think this PR should be shipped (once the CI passes). I just wanted to heads up about the edge case. Sorry for the confusion!

@zou3519 Internal to Meta or other companies.

That makes sense to me

@zou3519 I don't mean to block this PR. I think this PR should be shipped (once the CI passes). I just wanted to heads up about the edge case. Sorry for the confusion!

Oh I was asking so that I can drop in some more comments here about the current state. I'll update this PR with your comments, thanks for the discussions!

@luyuzhe111

I'm recording down my understanding of how eagle and the compilation cache works after discussing vllm-project#17211 with @luyuzhe111 and @WoosukKwon. In the future we likely will have a situation where we want to torch.compile multiple pieces of code (e.g. decoder and encoder separately) and then we'll need to refactor the system to support it (each compiled region needs its own cache directory with its own hash) But until then the current design seems fine. Signed-off-by: rzou <[email protected]>

zou3519 · 2025-05-10T01:32:18Z

Look like the assumptions are wrong (the asserts are triggering on the tests), so we need some fixing. I have some idea of how to do this, it'll be a bigger refactor.

[2025-05-09T20:34:59Z] if compilation_counter.num_graphs_seen > 0:
| [2025-05-09T20:34:59Z] # NOTE: Eagle compilation
| [2025-05-09T20:34:59Z] # The eagle head is a separate model that gets run, so it needs
| [2025-05-09T20:34:59Z] # its own cache dir (each cache dir is 1:1 with a model.forward).
| [2025-05-09T20:34:59Z] #
| [2025-05-09T20:34:59Z] # We currently assume that the eagle head does not need its own
| [2025-05-09T20:34:59Z] # hash: in the vLLM repo, the hash of the original model currently
| [2025-05-09T20:34:59Z] # entirely determines the config of the eagle head.
| [2025-05-09T20:34:59Z] # It's very possible that this assumption will change in the
| [2025-05-09T20:34:59Z] # future and we'll need to update this code.
| [2025-05-09T20:34:59Z] #
| [2025-05-09T20:34:59Z] # If you are here because you are using multiple torch.compile
| [2025-05-09T20:34:59Z] # calls in a single model, please open an issue and let's discuss.
| [2025-05-09T20:34:59Z] speculative_config = self.vllm_config.speculative_config
| [2025-05-09T20:34:59Z] > assert speculative_config is not None
| [2025-05-09T20:34:59Z] E torch._dynamo.exc.BackendCompilerFailed: backend='<vllm.compilation.backends.VllmBackend object at 0x7f84b548a1e0>' raised:
| [2025-05-09T20:34:59Z] E AssertionError:
| [2025-05-09T20:34:59Z] E
| [2025-05-09T20:34:59Z] E Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

zou3519 force-pushed the eager3_assert branch from 631cfc2 to 04dba76 Compare May 5, 2025 15:10

zou3519 requested a review from WoosukKwon May 5, 2025 15:11

zou3519 marked this pull request as ready for review May 5, 2025 15:12

zou3519 requested a review from houseroad May 5, 2025 15:12

zou3519 mentioned this pull request May 5, 2025

[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE #17211

Merged

WoosukKwon approved these changes May 8, 2025

View reviewed changes

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label May 8, 2025

zou3519 force-pushed the eager3_assert branch from 04dba76 to 41f215f Compare May 9, 2025 19:09

zou3519 force-pushed the eager3_assert branch from 41f215f to 0512f5a Compare May 9, 2025 19:10

zou3519 closed this Jun 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Tighten compilation cache invariants around eagle #17662

Tighten compilation cache invariants around eagle #17662

Uh oh!

zou3519 commented May 5, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented May 5, 2025

Uh oh!

WoosukKwon May 8, 2025

Uh oh!

WoosukKwon May 8, 2025

Uh oh!

WoosukKwon May 8, 2025

Uh oh!

zou3519 May 8, 2025

Uh oh!

WoosukKwon May 9, 2025

Uh oh!

WoosukKwon May 9, 2025

Uh oh!

zou3519 May 9, 2025

Uh oh!

zou3519 commented May 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

	assert speculative_config.method in ("eagle", "eagle3")
	assert speculative_config.use_eagle()

Uh oh!

Tighten compilation cache invariants around eagle #17662

Tighten compilation cache invariants around eagle #17662

Uh oh!

Conversation

zou3519 commented May 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 5, 2025

Uh oh!

WoosukKwon May 8, 2025

Choose a reason for hiding this comment

Uh oh!

WoosukKwon May 8, 2025

Choose a reason for hiding this comment

Uh oh!

WoosukKwon May 8, 2025

Choose a reason for hiding this comment

Uh oh!

zou3519 May 8, 2025

Choose a reason for hiding this comment

Uh oh!

WoosukKwon May 9, 2025

Choose a reason for hiding this comment

Uh oh!

WoosukKwon May 9, 2025

Choose a reason for hiding this comment

Uh oh!

zou3519 May 9, 2025

Choose a reason for hiding this comment

Uh oh!

zou3519 commented May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

zou3519 commented May 5, 2025 •

edited by github-actions bot

Loading

zou3519 commented May 10, 2025 •

edited

Loading