[Feature] Optim PaddleOCR-VL by ming1753 · Pull Request #4873 · PaddlePaddle/FastDeploy

ming1753 · 2025-11-06T17:27:14Z

Motivation

Optim PaddleOCR-VL Performence

Modifications

Add new ops fused_neox_rope_embedding and gelu_tanh
Fix input embedding cuda buffer

Usage or Command

Enabled by default, does not affect usage.

Accuracy Tests

Refer to op tests of this PR.

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-11-06T17:27:20Z

Thanks for your contribution!

Copilot

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Copilot · 2025-11-07T05:32:16Z

fastdeploy/model_executor/models/paddleocr_vl/siglip_ops.py

+def native_neox_rope_embedding(qkv, cos, sin, num_heads):
+    B, seq_length, D = qkv.shape
+    qkv = qkv.reshape(
+        [
+            seq_length,
+            3,
+            num_heads,
+            -1,
+        ]


The function signature expects 4 parameters (qkv, cos, sin, num_heads) but the reshape operation uses the variable B which is unpacked from qkv.shape on line 43. However, in the test file test_fused_neox_rope_embedding.py, the qkv tensor has shape [token_num, 3 * hidden_size] (2D), not 3D. This will cause an error when trying to unpack B, seq_length, D = qkv.shape since there are only 2 dimensions. The reshape should use seq_length directly from qkv.shape[0].

Copilot · 2025-11-07T05:32:16Z

fastdeploy/model_executor/models/paddleocr_vl/siglip.py

    def forward(self, hidden_states: paddle.Tensor) -> paddle.Tensor:
        hidden_states = self.fc1(hidden_states)
-        hidden_states = self.activation_fn(hidden_states)
+        hidden_states = self.activation_fn(hidden_states[0])


Incorrect tensor indexing. hidden_states is a tensor output from self.fc1(), not a list. Indexing with [0] would incorrectly extract only the first element along the first dimension rather than passing the full tensor to the activation function. This should be hidden_states = self.activation_fn(hidden_states) to properly apply the activation function to the entire tensor.

Suggested change

hidden_states = self.activation_fn(hidden_states[0])

hidden_states = self.activation_fn(hidden_states)

Copilot · 2025-11-07T05:32:17Z

custom_ops/gpu_ops/fused_neox_rope_embedding.cu

+        int cur_idx_1 = base_split_idx_left + i;
+        int cur_idx_2 = base_split_idx_right + i;


The variables cur_idx_1 and cur_idx_2 are computed but never used. These lines should either be removed or the computed indices should be used for the Store operations if they serve a purpose.

Suggested change

int cur_idx_1 = base_split_idx_left + i;

int cur_idx_2 = base_split_idx_right + i;

* [Feature] Optim PaddleOCR-VL * fix bug

[Feature] Optim PaddleOCR-VL

7d372a9

fix bug

ef3ec4f

Jiang-Jia-Jun requested a review from Copilot November 7, 2025 05:28

Copilot AI reviewed Nov 7, 2025

View reviewed changes

Jiang-Jia-Jun added the skip-ci: coverage label Nov 7, 2025

Jiang-Jia-Jun merged commit cba185f into PaddlePaddle:develop Nov 7, 2025
19 of 22 checks passed

ming1753 deleted the dev_ocr branch November 12, 2025 12:59

chang-wenbin pushed a commit to chang-wenbin/FastDeploy that referenced this pull request Mar 2, 2026

[Feature] Optim PaddleOCR-VL (PaddlePaddle#4873)

c4178d2

* [Feature] Optim PaddleOCR-VL * fix bug

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Optim PaddleOCR-VL#4873

[Feature] Optim PaddleOCR-VL#4873
Jiang-Jia-Jun merged 2 commits intoPaddlePaddle:developfrom
ming1753:dev_ocr

ming1753 commented Nov 6, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Nov 6, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 7, 2025

Uh oh!

Copilot AI Nov 7, 2025

Uh oh!

Copilot AI Nov 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	hidden_states = self.activation_fn(hidden_states[0])
	hidden_states = self.activation_fn(hidden_states)

		int cur_idx_1 = base_split_idx_left + i;
		int cur_idx_2 = base_split_idx_right + i;

Conversation

ming1753 commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Nov 6, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ming1753 commented Nov 6, 2025 •

edited

Loading