[Core][V0] Add guidance backend for structured output #14589

russellb · 2025-03-11T01:02:02Z

This commit is based on the PR #10217. It is updated to be compatible
with main.

Signed-off-by: Russell Bryant [email protected]
Co-authored-by: Loc Huynh [email protected]
Co-authored-by: Michal Moskal [email protected]

I started looking at this after talking to @joerunde about some performance issues observed in production. While the ultimate goal is to get everyone to V1 where we expect to provide more drastic improvements, I wanted to see if we could do something to help in V0 in the meantime.

In both my own testing and from the team that reported the problem, performance with guidelines has resolved their problems. The problem was outlines taking too long up front to compile complex json schemas.

I think this is worth including to help people get by in V0 until they're ready for the full transition to V1.

Closes #14151

github-actions · 2025-03-11T01:02:13Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mergify · 2025-03-11T01:02:38Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @russellb.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Harsha-Nori · 2025-03-11T04:36:38Z

Hey @russellb this looks fantastic! On point 3 -- is it possible to share an example of a workload where TPOT is reducing vs. xgrammar? I don't typically see this in JSON schemas so it'd be a helpful case to look for performance improvements.

russellb · 2025-03-11T13:28:05Z

Hey @russellb this looks fantastic! On point 3 -- is it possible to share an example of a workload where TPOT is reducing vs. xgrammar? I don't typically see this in JSON schemas so it'd be a helpful case to look for performance improvements.

I'm using these changes to the benchmark suite on top of this PR: #14567

I'm running the following command, changing the structured output backend to guidance, xgrammar, or outlines, and also running with a request rate of 1, 5, 10.

python3 benchmarks/benchmark_serving_structured_output.py --port 8432 --model meta-llama/Llama-3.1-8B-Instruct --dataset json-unique --structured-output-ratio 1.0 --structured-output-backend guidance --output-len 300 --num-prompts 90 --request-rate 1

russellb · 2025-03-11T13:40:28Z

I'm using these changes to the benchmark suite on top of this PR: #14567

This has been merged and I rebased this branch on top of main to include it.

russellb · 2025-03-13T18:01:06Z

Testing results via @joerunde are positive. I think this is worth including to help affected users get by a little longer while people transition to V1.

aarnphm

Just a quick thoughts, but we can get this one in given that it is already being used in prod for v0.

vllm/model_executor/guided_decoding/guidance_decoding.py

vllm/model_executor/guided_decoding/guidance_utils.py

mergify · 2025-03-14T21:26:12Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @russellb.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

aarnphm

Let's get this in for v0

mgoin

Nice work keeping it minimal, just a few questions

requirements/common.txt

vllm/model_executor/guided_decoding/guidance_decoding.py

requirements/common.txt

vllm/model_executor/guided_decoding/guidance_decoding.py

simon-mo · 2025-03-17T04:32:02Z

This is really a blocking issue for v0.8.0? It looks like features that is nice to have?

aarnphm · 2025-03-17T04:52:05Z

yeah I don't think this would block 0.8.0.

nice to have

DarkLight1337 · 2025-03-17T08:50:28Z

Can you fix the merge conflict?

aarnphm · 2025-03-18T15:27:48Z

vllm/model_executor/guided_decoding/guidance_logits_processors.py

+import os
+from typing import Any, List, Type, Union
+
+import llguidance  # type: ignore[import-untyped]


We can probably remove this type: ignore since guidance-ai/llguidance#139

once that makes it into a release at least

russellb · 2025-03-18T20:09:59Z

The failure in buildkite/ci/pr/entrypoints-test is unrelated. outlines allowed an array instead of providing a single json object, which the test treats as a failure (rightfully so).

This is the V1 integration for [guidance](https://github.com/guidance-ai/llguidance) as a backend for structured output. There is a V0 integration in vllm-project#14589. This backend provides some key benefits to V1: * Broader jsonschema support * Quick startup performance for large schemas Instead of precomputing the masks for all states, this is done on the fly. We see very fast request startup times, even for large schemas. This should make V1 roughly feature equivalent to V0 in terms of the types of schemas it can support. More technical details are available in the llguidance git repo. Signed-off-by: Russell Bryant <[email protected]> Co-authored-by: Loc Huynh <[email protected]>

mmoskal · 2025-03-18T21:07:46Z

@russellb I have simplified version here:

russellb/vllm@llguidance-v0-integration...mmoskal:llg_v0_matcher

will hopefully be able to test it later today

DarkLight1337 · 2025-03-19T10:52:07Z

Entrypoints tests is failing consistently, PTAL.

aarnphm · 2025-03-19T10:57:42Z

Entrypoints tests is failing consistently, PTAL.

Seems like an outlines issue with json_object? which is not related to this PR.

But I think that Michal wants to merge a diff implementation later.

russellb · 2025-03-19T13:18:32Z

Entrypoints tests is failing consistently, PTAL.

Seems like an outlines issue with json_object? which is not related to this PR.

Yes - I'm going to push a PR to fix it today.

But I think that Michal wants to merge a diff implementation later.

That's fine, though it could also be a follow-up PR if this is a working iteration to start with and build on.

russellb · 2025-03-19T13:59:00Z

Entrypoints tests is failing consistently, PTAL.

Seems like an outlines issue with json_object? which is not related to this PR.

Yes - I'm going to push a PR to fix it today.

For outlines, we actually fall back to xgrammar for json_object support, so the fix is in the xgrammar backend. However, xgrammar fails on a schema of {"type": "object"}, so I won't be able to fix it until this is resolved: mlc-ai/xgrammar#256

russellb · 2025-03-19T14:08:26Z

The tests should work now -- I made guidance the new fallback for json_object for both xgrammar and outlines

JC1DA · 2025-03-19T21:31:42Z

requirements/common.txt

@@ -18,6 +18,7 @@ pillow  # Required for image processing
 prometheus-fastapi-instrumentator >= 7.0.0
 tiktoken >= 0.6.0  # Required for DBRX tokenizer
 lm-format-enforcer >= 0.10.11, < 0.11
+llguidance>=0.6.15; platform_machine == "x86_64" or platform_machine == "arm64"


@russellb if you want to move on with this PR, can we fix the version to 0.6.x because there is a major change in the interface in 0.7.x
Update: @mmoskal confirmed no change needed

This commit is based on the PR vllm-project#10217. It is updated to be compatible with `main`. It has also been significantly updated and simplified to take advantage of changes to llguidance since the original PR was written. Many thanks to the llguidance team for the help on this. Signed-off-by: Russell Bryant <[email protected]> Co-authored-by: Loc Huynh <[email protected]> Co-authored-by: Michal Moskal <[email protected]> Co-authored-by: Aaron Pham <[email protected]>

mgoin

LGTM great work folks

This is the V1 integration for [guidance](https://github.com/guidance-ai/llguidance) as a backend for structured output. There is a V0 integration in vllm-project#14589. This backend provides some key benefits to V1: * Broader jsonschema support * Quick startup performance for large schemas Instead of precomputing the masks for all states, this is done on the fly. We see very fast request startup times, even for large schemas. This should make V1 roughly feature equivalent to V0 in terms of the types of schemas it can support. An `auto` mode is also included, which includes opinionated fallback behavior based on our current understanding for varying feature support and performance characteristics for different scenarios. More technical details are available in the llguidance git repo. Signed-off-by: Russell Bryant <[email protected]> Co-authored-by: Loc Huynh <[email protected]> Co-authored-by: Michal Moskal <[email protected]>

…4589) Signed-off-by: Russell Bryant <[email protected]> Co-authored-by: Loc Huynh <[email protected]> Co-authored-by: Michal Moskal <[email protected]> Co-authored-by: Aaron Pham <[email protected]>

…4589) Signed-off-by: Russell Bryant <[email protected]> Co-authored-by: Loc Huynh <[email protected]> Co-authored-by: Michal Moskal <[email protected]> Co-authored-by: Aaron Pham <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

…4589) Signed-off-by: Russell Bryant <[email protected]> Co-authored-by: Loc Huynh <[email protected]> Co-authored-by: Michal Moskal <[email protected]> Co-authored-by: Aaron Pham <[email protected]>

…4589) Signed-off-by: Russell Bryant <[email protected]> Co-authored-by: Loc Huynh <[email protected]> Co-authored-by: Michal Moskal <[email protected]> Co-authored-by: Aaron Pham <[email protected]> Signed-off-by: Mu Huai <[email protected]>

russellb requested a review from mgoin as a code owner March 11, 2025 01:02

mergify bot added ci/build structured-output labels Mar 11, 2025

mergify bot added the needs-rebase label Mar 11, 2025

russellb force-pushed the llguidance-v0-integration branch from 5244b7f to 2db1a0e Compare March 11, 2025 01:14

mergify bot removed the needs-rebase label Mar 11, 2025

russellb force-pushed the llguidance-v0-integration branch from 2db1a0e to d66d439 Compare March 11, 2025 13:40

russellb mentioned this pull request Mar 13, 2025

[V1] guidance backend for structured output + auto fallback mode #14779

Merged

aarnphm approved these changes Mar 13, 2025

View reviewed changes

vllm/model_executor/guided_decoding/guidance_decoding.py Outdated Show resolved Hide resolved

vllm/model_executor/guided_decoding/guidance_utils.py Outdated Show resolved Hide resolved

mergify bot added the needs-rebase label Mar 14, 2025

aarnphm approved these changes Mar 14, 2025

View reviewed changes

mgoin approved these changes Mar 14, 2025

View reviewed changes

requirements/common.txt Outdated Show resolved Hide resolved

vllm/model_executor/guided_decoding/guidance_decoding.py Outdated Show resolved Hide resolved

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 14, 2025

aarnphm reviewed Mar 15, 2025

View reviewed changes

requirements/common.txt Outdated Show resolved Hide resolved

JC1DA reviewed Mar 15, 2025

View reviewed changes

vllm/model_executor/guided_decoding/guidance_decoding.py Outdated Show resolved Hide resolved

aarnphm mentioned this pull request Mar 16, 2025

[Frontend][Core] Add Guidance backend for guided decoding #10217

Closed

aarnphm added this to the v0.8.0 milestone Mar 16, 2025

simon-mo removed this from the v0.8.0 milestone Mar 17, 2025

aarnphm reviewed Mar 18, 2025

View reviewed changes

JC1DA reviewed Mar 19, 2025

View reviewed changes

russellb force-pushed the llguidance-v0-integration branch from ff4b448 to 2631863 Compare March 19, 2025 22:28

mgoin approved these changes Mar 19, 2025

View reviewed changes

mgoin enabled auto-merge (squash) March 19, 2025 23:08

vllm-bot merged commit 1f16b7f into vllm-project:main Mar 20, 2025
58 of 60 checks passed

This was referenced Mar 28, 2025

[RFC]: CI for key features vllm-project/vllm-ascend#413

Open

[CI]Add guided decoding test vllm-project/vllm-ascend#422

Merged

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

Uh oh!

[Core][V0] Add guidance backend for structured output #14589

[Core][V0] Add guidance backend for structured output #14589

Uh oh!

Conversation

russellb commented Mar 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 11, 2025

Uh oh!

mergify bot commented Mar 11, 2025

Uh oh!

Harsha-Nori commented Mar 11, 2025

Uh oh!

russellb commented Mar 11, 2025

Uh oh!

russellb commented Mar 11, 2025

Uh oh!

russellb commented Mar 13, 2025

Uh oh!

aarnphm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Mar 14, 2025

Uh oh!

aarnphm left a comment

Choose a reason for hiding this comment

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

simon-mo commented Mar 17, 2025

Uh oh!

aarnphm commented Mar 17, 2025

Uh oh!

DarkLight1337 commented Mar 17, 2025

Uh oh!

aarnphm Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

russellb Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

russellb commented Mar 18, 2025

Uh oh!

mmoskal commented Mar 18, 2025

Uh oh!

DarkLight1337 commented Mar 19, 2025

Uh oh!

aarnphm commented Mar 19, 2025

Uh oh!

russellb commented Mar 19, 2025

Uh oh!

russellb commented Mar 19, 2025

Uh oh!

russellb commented Mar 19, 2025

Uh oh!

JC1DA Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

russellb commented Mar 11, 2025 •

edited by github-actions bot

Loading

JC1DA Mar 19, 2025 •

edited

Loading