[Intel HPU] Support intel hpu platform by fmiao2372 · Pull Request #4161 · PaddlePaddle/FastDeploy

fmiao2372 · 2025-09-17T12:35:35Z

FastDeploy在Intel HPU上已完成ERNIE 4.5模型的适配

依赖信息：
Gaudi software: 1.22.0
PaddlePaddle：3.1.1
PaddleCustomDevice: latest develop branch

更多模型的支持和性能的优化会继续更新。

paddle-bot · 2025-09-17T12:35:40Z

Thanks for your contribution!

zoooo0820 · 2025-09-18T03:29:05Z

fastdeploy/platforms/intel_hpu.py

+        try:
+            # assert len(paddle.static.cuda_places()) > 0
+            return True
+        except Exception as e:


This check doesn't seem to work.

zoooo0820 · 2025-09-18T03:50:30Z

fastdeploy/model_executor/ops/intel_hpu/__init__.py

+# PACKAGE = "fastdeploy.model_executor.ops.intel_hpu"
+PACKAGE = "paddlenlp_ops"
+
+import_custom_ops(PACKAGE, "paddlenlp_ops", globals())


here should be fastdeploy.model_executor.ops.intel_hpu instead of paddlenlp_ops ?

Is this because of the naming convention of the ops implementation in custom device?

yes, real custom ops come from paddlecustomdevice, we just rename it in fastdeploy

zoooo0820 · 2025-09-18T03:51:40Z

fastdeploy/model_executor/ops/intel_hpu/__init__.py

@@ -0,0 +1,21 @@
+# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
+#


zoooo0820 · 2025-09-18T06:45:14Z

fastdeploy/model_executor/layers/attention/base_attention_backend.py

        raise NotImplementedError
+
+
+class AttentionBackend_HPU(AttentionBackend):


Will it be better to move this class to fastdeploy/model_executor/layers/attention/hpu_attn_backend.py ?

zoooo0820 · 2025-09-18T06:53:56Z

fastdeploy/engine/args_utils.py

+            "--enable-tensor-or-expert-parallel",
+            action='store_true',
+            default=EngineArgs.enable_tensor_or_expert_parallel,
+            help="Enable tensor parallelism for non-MoE and expert parallelism for MoE.")


could we enable tp + ep by setting --enable-expert-parallel and --tensor-parrllel-size without adding a new argument ?

currently EP is combined with DP, so we can't enable tp + ep with existing parameters
https://github.com/PaddlePaddle/FastDeploy/blob/develop/fastdeploy/config.py#L316-L318
https://github.com/PaddlePaddle/FastDeploy/blob/develop/fastdeploy/model_executor/layers/moe/moe.py#L132-L134

zoooo0820 · 2025-09-18T07:02:38Z

fastdeploy/worker/worker_process.py

+
+    parallel_config.engine_worker_queue_port = parallel_config.engine_worker_queue_port[
+        parallel_config.local_data_parallel_id
+    ]


All CI fails at this line. TypeError: '\''int'\'' object is not subscriptable' . We need to solve it first and then see if there are any other problems

YuanRisheng · 2025-09-19T08:37:41Z

fastdeploy/model_executor/layers/backends/intel_hpu/attention/hpu_attn_backend.py

@@ -0,0 +1,314 @@
+"""


layers目录下有一个backends文件夹，里边放着各类device的layer有关的实现，把attention和moe的实现都放到这个文件夹下吧

按照要求，已经移动到了backends目录下

YuanRisheng · 2025-09-19T08:40:05Z

fastdeploy/model_executor/layers/linear.py

+        elif current_platform.is_intel_hpu():
+            self.forward = self.forward_intel_hpu


forard_cuda名字可能现在已经不太合适叫这个了，应该是可以复用forward_cuda的，逻辑都是一样的

已经改为复用forward_cuda

YuanRisheng · 2025-09-19T08:44:23Z

fastdeploy/model_executor/layers/sample/sampler.py

+        elif current_platform.is_intel_hpu():
+            self.forward = self.forward_intel_hpu


这个和其他硬件平台有何不同之处吗，为啥需要单独写逻辑，不能抽象为几个op然后调用forward_cuda吗

目前我们采用fused的方式，是因为在我们平台上性能会比较好，后面会考虑在不影响性能的前提下进行拆分

YuanRisheng · 2025-09-19T08:51:56Z

fastdeploy/model_executor/load_weight_utils.py

 from fastdeploy.platforms import current_platform


+def reload_ep_checkpoint(model_path: str, fd_config: FDConfig, state_dict: dict, return_numpy: bool = False):


为什么会修改加载模型这块儿的内容，是因为用的不是官方的模型吗

没有修改模型，还是官方的模型，只是为了支持TP+EP模式的模型加载

我们支持的TP+EP模式，是dense部分用TP，MoE部分不用TP(也不用DP)，只用EP(EP的数目=TP)。所以在模型load的时候，如果配置了TP，默认会把MoE的系数也按照TP的模式切分了，这个reload_ep_checkpoint完成的功能，是把切分的MoE weights先删除掉，然后重新在expert维度将各自完整的weights划分给不同的卡。

YuanRisheng · 2025-09-19T08:53:39Z

fastdeploy/config.py

        self.expert_parallel_size = 1  # EP degree
        self.data_parallel_size = 1  # DP degree
        self.enable_expert_parallel = False
+        self.enable_tensor_or_expert_parallel = False


这里不能通过enable_expert_parallel或者是expert_parallel_size，tensor_parallel_size等这些字段组合判断吗，必须要给用户接口加新字段吗

目前FD里面EP是和DP是绑定的，EP size等于DP size，而且moe里面限制了TP和EP不能同时开，所以支持TP+EP 最佳的选择是加一个参数
https://github.com/PaddlePaddle/FastDeploy/blob/develop/fastdeploy/model_executor/layers/moe/moe.py#L132-L134

这个参数的目的是让moe部分同时打开TP+EP并行吗？

dense部分用TP，MoE部分用EP(EP的数目=TP)。

zoooo0820 · 2025-09-19T09:01:18Z

fastdeploy/engine/args_utils.py

        cache_cfg = CacheConfig(all_dict)
        load_cfg = LoadConfig(all_dict)
        parallel_cfg = ParallelConfig(all_dict)
+        cache_cfg.enc_dec_block_num = self.static_decode_blocks


It could be better to set this value as https://github.com/PaddlePaddle/FastDeploy/blob/release/2.2/fastdeploy/config.py#L899 to avoid impact on other hardware.

It's not only for specific platform. It maybe be a bug, the parameter "static_decode_blocks" in EngineArgs can't be passed to cache_cfg even on GPUs because it has no static_decode_blocks but enc_dec_block_num

It seems that there is a problem that the static_decode_blocks not being passed to cache_cfg.

Could you please move enc_dec_block_num setting for different platforms to this file? Since this line works after the cache_cfg initialization, the default value 2 may cause error, e.g. Iluvatar

after rebased to latest code, we can use FD_ENC_DEC_BLOCK_NUM to solve this problem. I had removed this line

carryyu · 2025-09-22T03:18:56Z

fastdeploy/worker/worker_process.py

+        else:
+            num_experts = model_config.moe_num_experts
+
+        num_experts_per_rank = num_experts // parallel_config.tensor_parallel_size


为什么通过tp_size划分专家呢

目前FD里面的逻辑是如果enable EP，则需要用dp_size划分专家 + enable-expert-parallel，类似，我们可以用tp_size划分专家 + enable_tensor_or_expert_parallel来支持TP+EP模式(dense部分用TP，MoE部分用EP(EP的数目=TP))

enable_tensor_or_expert_parallel这个参数感觉不是很清晰啊，这种dense TP moe EP的切分可以参考也开源框架vllm/SGLang的命名实现？现在看着会比较让人迷惑

命名是有问题，这个PR我们先去掉了相关的code，后面refine好后再提PR

fmiao2372 · 2025-09-22T07:57:07Z

@zoooo0820 @carryyu @YuanRisheng @gzy19990617 ,我们暂时把TP+EP模式去掉了，后面refine好后再单独合并

zoooo0820

LGTM

yuanlehome · 2025-09-23T07:30:30Z

fastdeploy/model_executor/forward_meta.py

+
+
+@dataclass
+class ForwardMeta_HPU:


命名是否可以与上面其他硬件保持一致呢，HPUForwardMeta

已改为HPUForwardMeta

paddle-bot bot added the contributor External developers label Sep 17, 2025

fmiao2372 force-pushed the integration_upstreaming branch from 7e59562 to d7509a6 Compare September 17, 2025 12:49

zoooo0820 reviewed Sep 18, 2025

View reviewed changes

YuanRisheng reviewed Sep 19, 2025

View reviewed changes

zoooo0820 reviewed Sep 19, 2025

View reviewed changes

carryyu reviewed Sep 22, 2025

View reviewed changes

fmiao2372 force-pushed the integration_upstreaming branch from 5137adb to 2ae2c61 Compare September 22, 2025 07:54

fmiao2372 force-pushed the integration_upstreaming branch from 2ae2c61 to e81e85a Compare September 22, 2025 08:14

fmiao2372 added 11 commits September 23, 2025 03:03

[Intel HPU] Support intel hpu platform

3272a7c

fix some issues

5db9e35

apply precommit and move AttentionBackend_HPU

3468976

fix format issue

e985aff

correct ops import

d07a484

fix ci issue

934a499

update code in layers

972face

fix code style issue

cb55256

remove dense tp moe ep mode

3368638

fix enc_dec_block_num

f510a49

fix rebase issue

cdc1d07

fmiao2372 force-pushed the integration_upstreaming branch from e81e85a to cdc1d07 Compare September 23, 2025 03:17

rename hpu to gaudi in readme

28d5ed7

zoooo0820 previously approved these changes Sep 23, 2025

View reviewed changes

yuanlehome reviewed Sep 23, 2025

View reviewed changes

rename ForwardMeta_HPU to HPUForwardMeta

d750f6d

fmiao2372 dismissed zoooo0820’s stale review via d750f6d September 23, 2025 07:46

yuanlehome approved these changes Sep 24, 2025

View reviewed changes

zoooo0820 approved these changes Sep 24, 2025

View reviewed changes

Jiang-Jia-Jun merged commit f1b5392 into PaddlePaddle:develop Sep 24, 2025
31 of 39 checks passed

fmiao2372 deleted the integration_upstreaming branch November 26, 2025 05:23

		@@ -0,0 +1,21 @@
		# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
		#

		raise NotImplementedError


		class AttentionBackend_HPU(AttentionBackend):

		elif current_platform.is_intel_hpu():
		self.forward = self.forward_intel_hpu

		from fastdeploy.platforms import current_platform


		def reload_ep_checkpoint(model_path: str, fd_config: FDConfig, state_dict: dict, return_numpy: bool = False):

Conversation

fmiao2372 commented Sep 17, 2025

Uh oh!

paddle-bot bot commented Sep 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmiao2372 Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zoooo0820 Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmiao2372 commented Sep 22, 2025

Uh oh!

zoooo0820 left a comment

Choose a reason for hiding this comment

Uh oh!

fmiao2372 Sep 22, 2025 •

edited

Loading

zoooo0820 Sep 19, 2025 •

edited

Loading

yuanlehome Sep 23, 2025 •

edited

Loading