Support GPT-OSS-BF16 by Limerances · Pull Request #4240 · PaddlePaddle/FastDeploy

Limerances · 2025-09-23T15:16:43Z

This PR adds support for the GPT OSS bf16 model. Compared to the vLLM, this PR implements Wint8 quantization and achieves a 15% leads in metrics such as QPS, TPS, and TTFT. This PR also introduces several new features to enhance model flexibility and performance, such as sinks in append attention, sliding window attention, bias support for MoE layers, and the swigluoai activation function.

New Features

Feature 1: Support Sinks in Append Attention

This feature introduces sinks in append attention, allowing certain tokens to remain visible across all decoding and enhances the control and stability of attention mechanisms, especially in long-context or multi-turn scenarios.

Feature 2: Support Sliding Window Attention (SWA)

This feature implements Sliding Window Attention, an efficient mechanism for handling long sequences by limiting the attention scope for each token.Sliding window constrains the visible key-value pairs during decoding, improving memory and efficiency in long-sequence inference.

Feature 3: Implement "swigluoai" activation function

This added support for SwigluOAI activation, a variant of SwiGLU with optimized scaling which provides configurable scaling factors (1.702, 7.0) and supports interleaved mode.

Feature 4: Add Bias support for MoE layers

This extended MoE feed-forward to correctly apply expert-specific bias during down projection. And ensures each token routes to the correct expert with its associated bias term.

Usage Example

Start online service

python -m fastdeploy.entrypoints.openai.api_server \
       --model /path/to/gpt-oss-20b-bf16 \
       --port 8188 \
       --engine-worker-queue-port 51001\
       --cache-queue-port 51002 \
       --host 0.0.0.0 \
       --max-model-len 32768 \
       --max-num-seqs 256 \
       --quantization wint8 \

Send a request

# 发送请求
curl -X POST "http://0.0.0.0:8180/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
  "messages": [
    {"role": "user", "content": "你是谁"}
  ]
}'
# 返回结果
"content":"<|channel|>analysis<|message|>The user speaks Chinese: \"你是谁\" (\"Who are you?\"). We should respond in Chinese, presumably. We have instructions: we should act as ChatGPT. We are to answer. The user wants an answer. Could be a brief intro. They might want a description: \"我是 ChatGPT, OpenAI's large language model...\"\n\nGiven the policy, we can do that, we have no disallowed content. The user didn't mention anything requiring policy.\n\nWe need to respond appropriately. There's no sensitive content.\n\nWe just reply with a self-introduction. We should add that \"I am ChatGPT, a language model developed by OpenAI.\" Possibly mention capabilities. That should be enough.<|end|><|start|>assistant<|channel|>final<|message|>我是 ChatGPT，一个由 OpenAI 训练的大型语言模型。我的主要任务是帮助你回答问题、提供信息、协助写作、翻译、进行创作等。你可以把我当作一个随时准备好回答你各种语义和知识问题的小助手。🚀"

paddle-bot · 2025-09-23T15:16:49Z

Thanks for your contribution!

CLAassistant · 2025-09-23T15:16:50Z

All committers have signed the CLA.

…into support-gpt-oss

ming1753

for CI

…into support-gpt-oss

yuanlehome · 2025-10-15T11:22:19Z

fastdeploy/model_executor/layers/attention/append_attn_backend.py

+        if (
+            hasattr(self.fd_config.model_config, "layer_types")
+            and self.fd_config.model_config.layer_types[layer.layer_id] == "sliding_attention"
+        ):
+            sliding_window = self.fd_config.model_config.sliding_window
+        else:
+            sliding_window = 0


这一块放到attention.py里吧

目前只有append_attention支持SWA，可以后面再提个PR放到attention.py里面，其他后端就要弹NotImplementedError了

ming1753

LGTM

gongshaotian

LGTM

…eploy into support-gpt-oss

gongshaotian

LGTM

XiaoguangHu01

LGTM

ming1753 and others added 11 commits September 8, 2025 18:24

[Feature] AppendAtten support sinks & HEAD_DIM=64

2360560

fix bug

4ab79a9

fix bug

e735da2

fix bug

70fe6e2

fix bug

e9c9f87

[Feature] support gpt-oss

330e908

fix bug

e961893

add mask

f6ebb7f

support-gpt-oss

738777c

support-gpt-oss

7c25f84

support-gpt-oss

965739c

paddle-bot bot added the contributor External developers label Sep 23, 2025

Limerances added 4 commits September 28, 2025 12:42

fix long seq

90b5095

support wint8

cedaecf

support wint8

a0997bc

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

68b0159

…into support-gpt-oss

ming1753 previously approved these changes Oct 13, 2025

View reviewed changes

support wint8

1d7a413

Limerances dismissed ming1753’s stale review via 1d7a413 October 13, 2025 11:20

Limerances added 2 commits October 14, 2025 14:27

update test

b27213f

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

0d2d1a3

…into support-gpt-oss

ming1753 changed the title ~~Support GPT-OSS~~ Support GPT-OSS-BF16 Oct 14, 2025

EmmonsCurse and others added 3 commits October 14, 2025 19:33

Merge branch 'develop' into support-gpt-oss

b84e814

Merge branch 'develop' into support-gpt-oss

2ea0abb

Merge branch 'develop' into support-gpt-oss

d5680bb

yuanlehome reviewed Oct 15, 2025

View reviewed changes

qingqing01 previously approved these changes Oct 15, 2025

View reviewed changes

ming1753 previously approved these changes Oct 15, 2025

View reviewed changes

DDDivano previously approved these changes Oct 16, 2025

View reviewed changes

gongshaotian previously approved these changes Oct 16, 2025

View reviewed changes

Limerances added 4 commits October 16, 2025 17:49

change sliding windows init pos

1934f63

support gpt-oss

2f144f6

support gpt-oss

f90235c

Merge branch 'support-gpt-oss' of https://github.com/Limerances/FastD…

ec49653

…eploy into support-gpt-oss

Limerances dismissed stale reviews from gongshaotian, DDDivano, ming1753, and qingqing01 via ec49653 October 17, 2025 09:00

gongshaotian approved these changes Oct 20, 2025

View reviewed changes

DDDivano approved these changes Oct 20, 2025

View reviewed changes

qingqing01 approved these changes Oct 20, 2025

View reviewed changes

XiaoguangHu01 approved these changes Oct 20, 2025

View reviewed changes

Jiang-Jia-Jun merged commit 1b9f351 into PaddlePaddle:develop Oct 20, 2025
24 of 31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support GPT-OSS-BF16#4240

Support GPT-OSS-BF16#4240
Jiang-Jia-Jun merged 25 commits intoPaddlePaddle:developfrom
Limerances:support-gpt-oss

Limerances commented Sep 23, 2025 •

edited by ming1753

Loading

Uh oh!

paddle-bot bot commented Sep 23, 2025

Uh oh!

CLAassistant commented Sep 23, 2025 •

edited

Loading

Uh oh!

ming1753 left a comment

Uh oh!

yuanlehome Oct 15, 2025

Uh oh!

ming1753 Oct 15, 2025

Uh oh!

ming1753 left a comment

Uh oh!

gongshaotian left a comment

Uh oh!

gongshaotian left a comment

Uh oh!

XiaoguangHu01 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Conversation

Limerances commented Sep 23, 2025 • edited by ming1753 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New Features

Feature 1: Support Sinks in Append Attention

Feature 2: Support Sliding Window Attention (SWA)

Feature 3: Implement "swigluoai" activation function

Feature 4: Add Bias support for MoE layers

Usage Example

Start online service

Send a request

Uh oh!

paddle-bot bot commented Sep 23, 2025

Uh oh!

CLAassistant commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ming1753 left a comment

Choose a reason for hiding this comment

Uh oh!

yuanlehome Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

ming1753 Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

ming1753 left a comment

Choose a reason for hiding this comment

Uh oh!

gongshaotian left a comment

Choose a reason for hiding this comment

Uh oh!

gongshaotian left a comment

Choose a reason for hiding this comment

Uh oh!

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Limerances commented Sep 23, 2025 •

edited by ming1753

Loading

CLAassistant commented Sep 23, 2025 •

edited

Loading