Skip to content

[Feature] support pool#3827

Merged
Jiang-Jia-Jun merged 46 commits intoPaddlePaddle:developfrom
lizexu123:pooling_emb
Sep 22, 2025
Merged

[Feature] support pool#3827
Jiang-Jia-Jun merged 46 commits intoPaddlePaddle:developfrom
lizexu123:pooling_emb

Conversation

@lizexu123
Copy link
Copy Markdown
Collaborator

@lizexu123 lizexu123 commented Sep 2, 2025

Pooling Model共需要支持4个模块,ModelConfig,Model Loader Model Runner中分为模型预热和模型执行
本pr完成了ModelConfig和ModelLoader,目前仅支持了runner为pooling,convert为embed的情况

本pr完成内容:

1.支持服务启动传递runner 为pooling,也可以传入convert为embed,不传递时可根据模型文件来判断convert类型。
2.支持判断模型为生成式模型还是pooling模型,如果为生成式模型,指定runner为pooling,则需要进行模型转换。
3.模型转换,支持生成式模型转换为pooling模型,对应删除ParallelLMHead权重,替换architectures为ForEmbedding,并且添加DispatchPooler层,根据pooling_type来决定是哪一个pool层。
4.支持Qwen3-embedding-0.6B 单卡模型load成功,多卡目前还有报错

待完成:
1.model Runner:
1.模型预热阶段
2.模型执行阶段
待解决:
使用tp>1 时加载Qwen3-Embedding-0.6B时会报错

启动pooling任务

model_path=/root/paddlejob/workspace/env_run/output/models/torch/Qwen3-0.6B
python -m fastdeploy.entrypoints.openai.api_server --model ${model_path} \
    --max-num-seqs 256 --max-model-len 32768 \
    --port 9412 --engine-worker-queue-port 7142 \
    --metrics-port 7211 --tensor-parallel-size 1 \
    --gpu-memory-utilization 0.9 \
    --load_choices "default_v1" \
    --runner pooling

Qwen3-0.6B生成式模型转换过程:内部会区分出convert为embed还是reward,score(这两个暂时未支持),对生成式模型转换成embedding模型,将lm_head权重删除,将architectures的后缀改成ForEmbedding,并添加DispatchPooler层,转换后self.model如下图所示
1f162e208e44535ae4367da7fe184136

image

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Sep 2, 2025

Thanks for your contribution!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements pooling model support for FastDeploy by introducing configurable model runners and conversion mechanisms. The implementation enables embedding and pooling tasks while maintaining compatibility with existing text generation models.

Key changes:

  • Introduces new runner types ("pooling", "generate") and conversion options ("embed", "none") with automatic detection
  • Implements model registry refactoring with lazy loading and better architecture support
  • Adds comprehensive pooling infrastructure including poolers, metadata, and output handling

Reviewed Changes

Copilot reviewed 28 out of 29 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
fastdeploy/config.py Core configuration for runner types, convert options, and pooler configuration
fastdeploy/model_executor/models/registry.py New model registry with lazy loading and pooling model detection
fastdeploy/model_executor/models/adapters.py Model conversion utilities for embedding and pooling models
fastdeploy/model_executor/layers/pooler.py Pooling layer implementations with different pooling strategies
fastdeploy/transformer_utils/config.py Configuration utilities for sentence transformers and pooling models
fastdeploy/worker/worker_process.py Command line argument parsing for new pooling options
fastdeploy/model_executor/model_loader/default_loader_v1.py Model loading with conversion support

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 28 out of 29 changed files in this pull request and generated 6 comments.



def is_pin_memory_available() -> bool:
pass
Copy link

Copilot AI Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function is_pin_memory_available is incomplete with just a pass statement. This will always return None instead of a boolean value, which could cause issues where this function is used.

Suggested change
pass
# Pin memory is available if PaddlePaddle is compiled with CUDA support
return paddle.is_compiled_with_cuda()

Copilot uses AI. Check for mistakes.
Comment on lines +214 to +216
for loaded_weight_name, loaded_weight in weights_iterator:
if "rotary_emb.inv_freq" in loaded_weight_name:
continue
Copy link

Copilot AI Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded string check for 'rotary_emb.inv_freq' should be moved to a constant or configuration to improve maintainability and avoid magic strings scattered throughout the code.

Copilot uses AI. Check for mistakes.
Comment on lines +87 to +90
try:
loaded_weight = loaded_weight.reshape(linear.weight.shape)
except:
continue
Copy link

Copilot AI Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using bare except: is not recommended as it catches all exceptions including system exits and keyboard interrupts. Use specific exception types like except (ValueError, RuntimeError): or at minimum except Exception:.

Copilot uses AI. Check for mistakes.
if linear.bias.shape != loaded_bias.shape:
try:
loaded_bias = loaded_bias.reshape(linear.bias.shape)
except:
Copy link

Copilot AI Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using bare except: is not recommended as it catches all exceptions including system exits and keyboard interrupts. Use specific exception types like except (ValueError, RuntimeError): or at minimum except Exception:.

Suggested change
except:
except Exception:

Copilot uses AI. Check for mistakes.
assert not pooling_cursor.is_partial_prefill(), "partial prefill not supported with MEAN pooling"

if hidden_states.place.is_gpu_place():
prompt_lens = pooling_cursor.prompt_lens_cpu.cuda()
Copy link

Copilot AI Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .cuda() method call appears to be PyTorch-specific but this is a PaddlePaddle codebase. This should use PaddlePaddle's device placement methods like .gpu() or .to(device='gpu').

Suggested change
prompt_lens = pooling_cursor.prompt_lens_cpu.cuda()
prompt_lens = pooling_cursor.prompt_lens_cpu.to(device='gpu')

Copilot uses AI. Check for mistakes.
cumsum = paddle.zeros([n_seq + 1], dtype="int64", place=paddle.CPUPlace())
paddle.cumsum(num_scheduled_tokens, axis=0, out=cumsum[1:])
if device == "gpu":
cumsum_device = cumsum.cuda()
Copy link

Copilot AI Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .cuda() method call appears to be PyTorch-specific but this is a PaddlePaddle codebase. This should use PaddlePaddle's device placement methods like .gpu() or .to(device='gpu').

Suggested change
cumsum_device = cumsum.cuda()
cumsum_device = cumsum.to(device='gpu')

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

@YuanRisheng YuanRisheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个PR需要加单测,新增代码太多了,没有单测的情况下FD代码库覆盖率下降会比较明显

return {out_name: value for name, value in values.items() if (out_name := self._map_name(name)) is not None}


class AutoWeightsLoader:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个东西是一定必须加的吗?有什么好处?麻烦解释清楚

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个没必要加,已删除



@dataclass
class WeightsMapper:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个东西是一定必须加的吗?有什么好处?麻烦解释清楚

Copy link
Copy Markdown
Collaborator Author

@lizexu123 lizexu123 Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是方便将比如model.layers.0.self_attn.o_proj.weight,替换成layers.0.self_attn.o_proj.weight,在目前的用处中,还有子字符串替换和后缀替换,也可以用 weights = ((name[6:], data) for name, data in weights if name.startswith("model."))用这个来替换跑通,上面的方式更优雅

@@ -0,0 +1,69 @@
"""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

文件名命名为pooler.py吧

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines -1319 to +1324
# 4. Execute spec decode
logits = self.model.compute_logits(hidden_states)
logits = None
if hasattr(self.model, "is_pooling_model") and self.model.is_pooling_model:
pass
else:
# 4. Execute spec decode
logits = self.model.compute_logits(hidden_states)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个写在compute_logits里面是不是更好,compute_logits里可以获取到self的

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

每个model里面都有compute_logits,这样都得写一遍,这个是临时方案,等下一个pr就不要这个了

Comment on lines +469 to +486
def parse_type(return_type: Callable[[str], T]) -> Callable[[str], T]:

def _parse_type(val: str) -> T:
try:
return return_type(val)
except ValueError as e:
raise argparse.ArgumentTypeError(f"Value {val} cannot be converted to {return_type}.") from e

return _parse_type


def optional_type(return_type: Callable[[str], T]) -> Callable[[str], Optional[T]]:

def _optional_type(val: str) -> Optional[T]:
if val == "" or val == "None":
return None
return parse_type(return_type)(val)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

写到utils文件里

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +195 to +199
self.runner = "auto"
self.convert = "auto"
self.pooler_config: Optional["PoolerConfig"] = field(init=False)
self.override_pooler_config: Optional[Union[dict, "PoolerConfig"]] = None
self.revision = None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么不单独加一个PoolerConfig?把runner/convert等都加进去

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pooler.py中创建DisPatchPooler,里面的ResolvedPoolingConfig它的from_config中需要pooler_config

Comment on lines +268 to +271
def registry(self):
from fastdeploy.model_executor.models.model_base import ModelRegistry

return ModelRegistry()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model_config怎么还返回了一个ModelRegistry呢

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我写的方法都不是类方法,旧的还是类方法

Comment on lines +27 to +29
from fastdeploy.utils import get_logger

logger = get_logger("utils", "utils.log")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

添加这个日志文件,怎么没看到有使用,删去?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除

Comment on lines +224 to +225
def is_pin_memory_available() -> bool:
pass
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个有用吗?不用先不用添加

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除

Comment on lines +52 to +53
T = TypeVar("T")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有用吗?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除


@ModelRegistry.register_model_class(
architecture="Qwen2_5_VLForConditionalGeneration",
module_path="qwen2_5_vl",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个怎么记得之前写的module_path是qwen2_5_vl.qwen2_5_vl,不用指定目录了吗?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已添加,指定目录

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit c86945e into PaddlePaddle:develop Sep 22, 2025
14 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants