Skip to content

[Core][V0] Add guidance backend for structured output #14589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 20, 2025

Conversation

russellb
Copy link
Member

@russellb russellb commented Mar 11, 2025

This commit is based on the PR #10217. It is updated to be compatible
with main.

Signed-off-by: Russell Bryant [email protected]
Co-authored-by: Loc Huynh [email protected]
Co-authored-by: Michal Moskal [email protected]


I started looking at this after talking to @joerunde about some performance issues observed in production. While the ultimate goal is to get everyone to V1 where we expect to provide more drastic improvements, I wanted to see if we could do something to help in V0 in the meantime.

In both my own testing and from the team that reported the problem, performance with guidelines has resolved their problems. The problem was outlines taking too long up front to compile complex json schemas.

I think this is worth including to help people get by in V0 until they're ready for the full transition to V1.

Closes #14151

@russellb russellb requested a review from mgoin as a code owner March 11, 2025 01:02
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link

mergify bot commented Mar 11, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @russellb.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 11, 2025
@russellb russellb force-pushed the llguidance-v0-integration branch from 5244b7f to 2db1a0e Compare March 11, 2025 01:14
@mergify mergify bot removed the needs-rebase label Mar 11, 2025
@Harsha-Nori
Copy link

Hey @russellb this looks fantastic! On point 3 -- is it possible to share an example of a workload where TPOT is reducing vs. xgrammar? I don't typically see this in JSON schemas so it'd be a helpful case to look for performance improvements.

@russellb
Copy link
Member Author

Hey @russellb this looks fantastic! On point 3 -- is it possible to share an example of a workload where TPOT is reducing vs. xgrammar? I don't typically see this in JSON schemas so it'd be a helpful case to look for performance improvements.

I'm using these changes to the benchmark suite on top of this PR: #14567

I'm running the following command, changing the structured output backend to guidance, xgrammar, or outlines, and also running with a request rate of 1, 5, 10.

python3 benchmarks/benchmark_serving_structured_output.py --port 8432 --model meta-llama/Llama-3.1-8B-Instruct --dataset json-unique --structured-output-ratio 1.0 --structured-output-backend guidance --output-len 300 --num-prompts 90 --request-rate 1

@russellb russellb force-pushed the llguidance-v0-integration branch from 2db1a0e to d66d439 Compare March 11, 2025 13:40
@russellb
Copy link
Member Author

I'm using these changes to the benchmark suite on top of this PR: #14567

This has been merged and I rebased this branch on top of main to include it.

@russellb
Copy link
Member Author

Testing results via @joerunde are positive. I think this is worth including to help affected users get by a little longer while people transition to V1.

Copy link
Collaborator

@aarnphm aarnphm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick thoughts, but we can get this one in given that it is already being used in prod for v0.

Copy link

mergify bot commented Mar 14, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @russellb.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 14, 2025
Copy link
Collaborator

@aarnphm aarnphm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this in for v0

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work keeping it minimal, just a few questions

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 14, 2025
@simon-mo
Copy link
Collaborator

This is really a blocking issue for v0.8.0? It looks like features that is nice to have?

@simon-mo simon-mo removed this from the v0.8.0 milestone Mar 17, 2025
@aarnphm
Copy link
Collaborator

aarnphm commented Mar 17, 2025

yeah I don't think this would block 0.8.0.

nice to have

@DarkLight1337
Copy link
Member

Can you fix the merge conflict?

import os
from typing import Any, List, Type, Union

import llguidance # type: ignore[import-untyped]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably remove this type: ignore since guidance-ai/llguidance#139

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once that makes it into a release at least

@russellb
Copy link
Member Author

The failure in buildkite/ci/pr/entrypoints-test is unrelated. outlines allowed an array instead of providing a single json object, which the test treats as a failure (rightfully so).

russellb added a commit to russellb/vllm that referenced this pull request Mar 18, 2025
This is the V1 integration for
[guidance](https://github.com/guidance-ai/llguidance) as a backend for
structured output. There is a V0 integration in vllm-project#14589.

This backend provides some key benefits to V1:

* Broader jsonschema support
* Quick startup performance for large schemas

Instead of precomputing the masks for all states, this is done on
the fly. We see very fast request startup times, even for large
schemas.

This should make V1 roughly feature equivalent to V0 in terms of the
types of schemas it can support.

More technical details are available in the llguidance git repo.

Signed-off-by: Russell Bryant <[email protected]>
Co-authored-by: Loc Huynh <[email protected]>
@mmoskal
Copy link
Contributor

mmoskal commented Mar 18, 2025

@russellb I have simplified version here:

russellb/vllm@llguidance-v0-integration...mmoskal:llg_v0_matcher

will hopefully be able to test it later today

@DarkLight1337
Copy link
Member

Entrypoints tests is failing consistently, PTAL.

@aarnphm
Copy link
Collaborator

aarnphm commented Mar 19, 2025

Entrypoints tests is failing consistently, PTAL.

Seems like an outlines issue with json_object? which is not related to this PR.

But I think that Michal wants to merge a diff implementation later.

@russellb
Copy link
Member Author

Entrypoints tests is failing consistently, PTAL.

Seems like an outlines issue with json_object? which is not related to this PR.

Yes - I'm going to push a PR to fix it today.

But I think that Michal wants to merge a diff implementation later.

That's fine, though it could also be a follow-up PR if this is a working iteration to start with and build on.

@russellb
Copy link
Member Author

Entrypoints tests is failing consistently, PTAL.

Seems like an outlines issue with json_object? which is not related to this PR.

Yes - I'm going to push a PR to fix it today.

For outlines, we actually fall back to xgrammar for json_object support, so the fix is in the xgrammar backend. However, xgrammar fails on a schema of {"type": "object"}, so I won't be able to fix it until this is resolved: mlc-ai/xgrammar#256

@russellb
Copy link
Member Author

The tests should work now -- I made guidance the new fallback for json_object for both xgrammar and outlines

@@ -18,6 +18,7 @@ pillow # Required for image processing
prometheus-fastapi-instrumentator >= 7.0.0
tiktoken >= 0.6.0 # Required for DBRX tokenizer
lm-format-enforcer >= 0.10.11, < 0.11
llguidance>=0.6.15; platform_machine == "x86_64" or platform_machine == "arm64"
Copy link
Contributor

@JC1DA JC1DA Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@russellb if you want to move on with this PR, can we fix the version to 0.6.x because there is a major change in the interface in 0.7.x
Update: @mmoskal confirmed no change needed

This commit is based on the PR vllm-project#10217. It is updated to be compatible
with `main`. It has also been significantly updated and simplified to
take advantage of changes to llguidance since the original PR was
written.

Many thanks to the llguidance team for the help on this.

Signed-off-by: Russell Bryant <[email protected]>
Co-authored-by: Loc Huynh <[email protected]>
Co-authored-by: Michal Moskal <[email protected]>
Co-authored-by: Aaron Pham <[email protected]>
@russellb russellb force-pushed the llguidance-v0-integration branch from ff4b448 to 2631863 Compare March 19, 2025 22:28
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM great work folks

@mgoin mgoin enabled auto-merge (squash) March 19, 2025 23:08
@vllm-bot vllm-bot merged commit 1f16b7f into vllm-project:main Mar 20, 2025
58 of 60 checks passed
russellb added a commit to russellb/vllm that referenced this pull request Mar 22, 2025
This is the V1 integration for
[guidance](https://github.com/guidance-ai/llguidance) as a backend for
structured output. There is a V0 integration in vllm-project#14589.

This backend provides some key benefits to V1:

* Broader jsonschema support
* Quick startup performance for large schemas

Instead of precomputing the masks for all states, this is done on
the fly. We see very fast request startup times, even for large
schemas.

This should make V1 roughly feature equivalent to V0 in terms of the
types of schemas it can support.

An `auto` mode is also included, which includes opinionated fallback
behavior based on our current understanding for varying feature support
and performance characteristics for different scenarios.

More technical details are available in the llguidance git repo.

Signed-off-by: Russell Bryant <[email protected]>
Co-authored-by: Loc Huynh <[email protected]>
Co-authored-by: Michal Moskal <[email protected]>
russellb added a commit to russellb/vllm that referenced this pull request Mar 24, 2025
This is the V1 integration for
[guidance](https://github.com/guidance-ai/llguidance) as a backend for
structured output. There is a V0 integration in vllm-project#14589.

This backend provides some key benefits to V1:

* Broader jsonschema support
* Quick startup performance for large schemas

Instead of precomputing the masks for all states, this is done on
the fly. We see very fast request startup times, even for large
schemas.

This should make V1 roughly feature equivalent to V0 in terms of the
types of schemas it can support.

An `auto` mode is also included, which includes opinionated fallback
behavior based on our current understanding for varying feature support
and performance characteristics for different scenarios.

More technical details are available in the llguidance git repo.

Signed-off-by: Russell Bryant <[email protected]>
Co-authored-by: Loc Huynh <[email protected]>
Co-authored-by: Michal Moskal <[email protected]>
erictang000 pushed a commit to erictang000/vllm that referenced this pull request Mar 25, 2025
…4589)

Signed-off-by: Russell Bryant <[email protected]>
Co-authored-by: Loc Huynh <[email protected]>
Co-authored-by: Michal Moskal <[email protected]>
Co-authored-by: Aaron Pham <[email protected]>
gmarinho2 pushed a commit to gmarinho2/vllm that referenced this pull request Apr 1, 2025
…4589)

Signed-off-by: Russell Bryant <[email protected]>
Co-authored-by: Loc Huynh <[email protected]>
Co-authored-by: Michal Moskal <[email protected]>
Co-authored-by: Aaron Pham <[email protected]>
lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025
…4589)

Signed-off-by: Russell Bryant <[email protected]>
Co-authored-by: Loc Huynh <[email protected]>
Co-authored-by: Michal Moskal <[email protected]>
Co-authored-by: Aaron Pham <[email protected]>
Signed-off-by: Louis Ulmer <[email protected]>
nishith-fujitsu pushed a commit to nishith-fujitsu/vllm that referenced this pull request Apr 9, 2025
…4589)

Signed-off-by: Russell Bryant <[email protected]>
Co-authored-by: Loc Huynh <[email protected]>
Co-authored-by: Michal Moskal <[email protected]>
Co-authored-by: Aaron Pham <[email protected]>
shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025
…4589)

Signed-off-by: Russell Bryant <[email protected]>
Co-authored-by: Loc Huynh <[email protected]>
Co-authored-by: Michal Moskal <[email protected]>
Co-authored-by: Aaron Pham <[email protected]>
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
…4589)

Signed-off-by: Russell Bryant <[email protected]>
Co-authored-by: Loc Huynh <[email protected]>
Co-authored-by: Michal Moskal <[email protected]>
Co-authored-by: Aaron Pham <[email protected]>
Signed-off-by: Mu Huai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build ready ONLY add when PR is ready to merge/full CI is needed structured-output
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Structured output requests can hang the server
9 participants