Inference tutorial - Part 3 of e2e series [WIP] #2343

jainapurva · 2025-06-09T23:18:31Z

No description provided.

pytorch-bot · 2025-06-09T23:18:34Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2343

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit bd2600f with merge base 2898903 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

docs/source/inference.rst

jerryzh168 · 2025-06-17T20:44:43Z

docs/source/inference.rst

+
+    vllm serve pytorch/Phi-4-mini-instruct-float8dq --tokenizer microsoft/Phi-4-mini-instruct -O3
+
+Inference with vLLM


should we move this after Inference with Transformers

cc @jainapurva I think if vLLM is our recommended serving solution, this should go before transformers.

jerryzh168 · 2025-06-17T20:45:36Z

docs/source/inference.rst

+
+vLLM automatically leverages torchao's optimized kernels when serving quantized models, providing significant throughput improvements.
+
+Setting up vLLM with Quantized Models


nit: this doesn't have to be a new section I think

docs/source/inference.rst

andrewor14 · 2025-06-17T21:48:34Z

Hi @jainapurva, by the way I'm adding a serving.rst here: #2394. It uses the same template as parts 1 and 2. After that's landed, do you mind updating your PR to use that file instead? Right now it's a blank page with the template:

docs/source/inference.rst

jerryzh168 · 2025-06-18T23:51:43Z

docs/source/inference.rst

+.. note::
+    For more information on supported quantization and sparsity configurations, see `HF-Torchao Docs <https://huggingface.co/docs/transformers/main/en/quantization/torchao>`_.
+
+Inference with vLLM


for this section, can you replace with https://huggingface.co/pytorch/Qwen3-8B-int4wo-hqq#inference-with-vllm

it might be easier to do command line compared to code

drisspg · 2025-06-23T16:31:31Z

docs/source/serving.rst

+            print(f"Output:    {generated_text!r}")
+            print("-" * 60)
+
+[Optional] Inference with Transformers


We should have an Inference w/ SGlang section

I tested the integration of TorchAO and SGLang, came across a lot of issues in running the server. As discussed with @jerryzh168 offline, we can add this later, after more thorough testing and updates.

Preliminary structure for tutorial

c0584b4

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 9, 2025

jainapurva added the topic: documentation Use this tag if this PR adds or improves documentation label Jun 10, 2025

jainapurva and others added 8 commits June 16, 2025 09:59

Updates

f4e8f2d

Update

7c2332e

Update

942a02b

Update

888fd4c

Update

c200cd2

Merge remote-tracking branch 'origin/main' into inference_tutorial

4f76b23

Update

c52e6f8

Update

de160b1

jerryzh168 reviewed Jun 17, 2025

View reviewed changes

docs/source/inference.rst Outdated Show resolved Hide resolved

jerryzh168 reviewed Jun 17, 2025

View reviewed changes

docs/source/inference.rst Outdated Show resolved Hide resolved

jerryzh168 reviewed Jun 17, 2025

View reviewed changes

docs/source/inference.rst Outdated Show resolved Hide resolved

jerryzh168 reviewed Jun 17, 2025

View reviewed changes

docs/source/inference.rst Outdated Show resolved Hide resolved

jerryzh168 reviewed Jun 17, 2025

View reviewed changes

docs/source/inference.rst Outdated Show resolved Hide resolved

jainapurva added 2 commits June 17, 2025 12:11

Update

e8f5e53

Update

bbd567d

jainapurva commented Jun 17, 2025

View reviewed changes

docs/source/inference.rst Outdated Show resolved Hide resolved

jainapurva requested review from jerryzh168, andrewor14, drisspg and jcaip June 17, 2025 20:42

jerryzh168 reviewed Jun 17, 2025

View reviewed changes

docs/source/inference.rst Outdated Show resolved Hide resolved

jerryzh168 reviewed Jun 17, 2025

View reviewed changes

Update notes

6a96697

jerryzh168 reviewed Jun 17, 2025

View reviewed changes

docs/source/inference.rst Outdated Show resolved Hide resolved

jcaip reviewed Jun 18, 2025

View reviewed changes

docs/source/inference.rst Outdated Show resolved Hide resolved

jainapurva added 3 commits June 18, 2025 12:11

Updates

06612d3

Merge remote-tracking branch 'origin/main' into inference_tutorial

a3aa301

Updates

ce675b8

jainapurva force-pushed the inference_tutorial branch from b93b892 to ce675b8 Compare June 18, 2025 21:05

jerryzh168 reviewed Jun 18, 2025

View reviewed changes

docs/source/inference.rst Outdated Show resolved Hide resolved

jerryzh168 reviewed Jun 18, 2025

View reviewed changes

drisspg reviewed Jun 23, 2025

View reviewed changes

jainapurva added 6 commits June 23, 2025 11:01

Updates

0311bc0

Updates to build torchao

2c44d25

Merge remote-tracking branch 'origin/main' into inference_tutorial

b163ef7

Updates to vllm serving

580a99c

Updates to vllm serving

17b7cb8

Fix formatting

bd2600f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inference tutorial - Part 3 of e2e series [WIP] #2343

Inference tutorial - Part 3 of e2e series [WIP] #2343

Uh oh!

jainapurva commented Jun 9, 2025

Uh oh!

pytorch-bot bot commented Jun 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jerryzh168 Jun 17, 2025

Uh oh!

jcaip Jun 18, 2025

Uh oh!

jerryzh168 Jun 17, 2025

Uh oh!

Uh oh!

andrewor14 commented Jun 17, 2025

Uh oh!

Uh oh!

Uh oh!

jerryzh168 Jun 18, 2025

Uh oh!

drisspg Jun 23, 2025

Uh oh!

jainapurva Jun 24, 2025

Uh oh!

Uh oh!


		vllm serve pytorch/Phi-4-mini-instruct-float8dq --tokenizer microsoft/Phi-4-mini-instruct -O3

		Inference with vLLM


		vLLM automatically leverages torchao's optimized kernels when serving quantized models, providing significant throughput improvements.

		Setting up vLLM with Quantized Models

Inference tutorial - Part 3 of e2e series [WIP] #2343

Are you sure you want to change the base?

Inference tutorial - Part 3 of e2e series [WIP] #2343

Uh oh!

Conversation

jainapurva commented Jun 9, 2025

Uh oh!

pytorch-bot bot commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2343

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jerryzh168 Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

jcaip Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andrewor14 commented Jun 17, 2025

Uh oh!

Uh oh!

Uh oh!

jerryzh168 Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

drisspg Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

jainapurva Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 9, 2025 •

edited

Loading