[Feature] support pooling model dummy_run by lizexu123 · Pull Request #4345 · PaddlePaddle/FastDeploy

lizexu123 · 2025-10-10T07:21:41Z

支持pooling模型dummy_pooler_run，以及将之前生成式模型预热阶段重构为dummy_sampler_run，并修复qwen3-embeeding-0.6B单卡加载的bug

…into pooling_emb_3

…into develop

paddle-bot · 2025-10-10T07:21:48Z

Thanks for your contribution!

gongshaotian · 2025-10-10T09:58:28Z

fastdeploy/worker/gpu_model_runner.py

+from fastdeploy.engine.pooling_params import PoolingParams
+from fastdeploy.engine.tasks import PoolingTask


从 engine import 东西到底层是合理的吗

这里是参考vllm做法，它是vllm/tasks,我就放到engine底下了

gongshaotian · 2025-10-11T03:07:06Z

fastdeploy/model_executor/models/interfaces_base.py

+class FdModel(Protocol[T_co]):
+    """The interface required for all models in FastDeploy."""


哪些类会继承FDModel，和 ModelForCasualLM 是啥关系

只有FDModelForPooling继承，和ModelForCasualLM没关系，ModelForCasualLM有compute_logits,pooling模型不计算这个

gongshaotian · 2025-10-11T03:14:39Z

fastdeploy/worker/gpu_model_runner.py

+            [num_reqs, req_num_tokens],
+            dtype="int32",
+        )
+        model = cast(FdModelForPooling, self.get_model())


同上，FdModelForPooling 和 ModelForCasualLM 关系是什么，一定要cast吗

这里是设置一些默认的pooling_type(如果用户不设置)，是需要cast的

gongshaotian · 2025-10-11T03:16:31Z

fastdeploy/worker/gpu_model_runner.py

+        to_update = model.pooler.get_pooling_updates(task)
+        to_update.apply(dummy_pooling_params)


to_update 用命名的语意准确吗

参考vllm规范实现的

…into pooling_emb_4

yuanlehome · 2025-10-13T02:56:27Z

fastdeploy/model_executor/layers/pool/metadata.py

+    cumsum = paddle.zeros([n_seq + 1], dtype="int64")
+    if cumsum.place.is_gpu_place():
+        cumsum = cumsum.cpu()


这里为啥不直接zeros一个cpu tensor ?

yuanlehome · 2025-10-13T02:59:53Z

fastdeploy/worker/gpu_model_runner.py


        self.attn_backends.append(attn_backend)

+    def _dummy_pooler_run_task(


为什么不直接实现在_dummy_pooler_run中，而是单独抽出一个_dummy_pooler_run_task ？

参看vllm规范写的

yuanlehome · 2025-10-13T03:05:25Z

fastdeploy/worker/gpu_model_runner.py

        self.speculative_decoding = self.speculative_method is not None
        self.enable_logprob = fd_config.model_config.enable_logprob
        self.enable_early_stop = self.fd_config.early_stop_config.enable_early_stop
+        self.is_pooling_model = self.fd_config.model_config.runner_type == "pooling"


self.is_pooling_model和is_pooling_model是否能去除一个？有都存在的必要性吗？

去除了is_pooling_model,保留了self.is_pooling_model

…into pooling_emb_4

lizexu123 added 8 commits September 22, 2025 20:33

support qwen3-embedding

7716866

fix ci bug

a1e505c

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

815d592

…into pooling_emb_3

support pooling dummy_run

d6d8c15

merge develop

5fde033

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

001f23d

…into develop

merge develop

31a4a6b

fix

d8cce66

delete print

b4f9d9c

gongshaotian reviewed Oct 11, 2025

View reviewed changes

merge develop

43fe17f

gongshaotian previously approved these changes Oct 11, 2025

View reviewed changes

lizexu123 dismissed gongshaotian’s stale review via 43fe17f October 11, 2025 06:23

lizexu123 added 2 commits October 11, 2025 15:00

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

d6df785

…into pooling_emb_4

parallel_config.max_model_len

fe5de8b

yuanlehome reviewed Oct 13, 2025

View reviewed changes

lizexu123 added 2 commits October 13, 2025 11:25

delete is_pooling_model in dummy_run

af9a48f

fix

73e5f07

yuanlehome previously approved these changes Oct 15, 2025

View reviewed changes

fix chongtu

7bae906

lizexu123 dismissed yuanlehome’s stale review via 7bae906 October 15, 2025 05:52

lizexu123 force-pushed the pooling_emb_4 branch from 7d53ef8 to 7bae906 Compare October 15, 2025 05:52

lizexu123 added 5 commits October 15, 2025 17:23

fd_model

7ba813f

merge develop

2b8832b

fix embedding load

5a84e16

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

3d0546a

…into pooling_emb_4

merge develop

b69db93

lizexu123 added 2 commits October 16, 2025 16:11

fix

9bc2e08

fix post_process

f1eb961

yuanlehome approved these changes Oct 17, 2025

View reviewed changes

Jiang-Jia-Jun merged commit c234b99 into PaddlePaddle:develop Oct 17, 2025
31 of 38 checks passed

littledgg added a commit to littledgg/FastDeploy that referenced this pull request Oct 24, 2025

fix bug about mtp from PaddlePaddle#4345

1b728a6

		from fastdeploy.engine.pooling_params import PoolingParams
		from fastdeploy.engine.tasks import PoolingTask

		class FdModel(Protocol[T_co]):
		"""The interface required for all models in FastDeploy."""

		to_update = model.pooler.get_pooling_updates(task)
		to_update.apply(dummy_pooling_params)


		self.attn_backends.append(attn_backend)

		def _dummy_pooler_run_task(

Conversation

lizexu123 commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot bot commented Oct 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lizexu123 Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lizexu123 commented Oct 10, 2025 •

edited

Loading

lizexu123 Oct 11, 2025 •

edited

Loading