[Feature] Optim PaddleOCR-VL#4873
Conversation
|
Thanks for your contribution! |
| def native_neox_rope_embedding(qkv, cos, sin, num_heads): | ||
| B, seq_length, D = qkv.shape | ||
| qkv = qkv.reshape( | ||
| [ | ||
| seq_length, | ||
| 3, | ||
| num_heads, | ||
| -1, | ||
| ] |
There was a problem hiding this comment.
The function signature expects 4 parameters (qkv, cos, sin, num_heads) but the reshape operation uses the variable B which is unpacked from qkv.shape on line 43. However, in the test file test_fused_neox_rope_embedding.py, the qkv tensor has shape [token_num, 3 * hidden_size] (2D), not 3D. This will cause an error when trying to unpack B, seq_length, D = qkv.shape since there are only 2 dimensions. The reshape should use seq_length directly from qkv.shape[0].
| def forward(self, hidden_states: paddle.Tensor) -> paddle.Tensor: | ||
| hidden_states = self.fc1(hidden_states) | ||
| hidden_states = self.activation_fn(hidden_states) | ||
| hidden_states = self.activation_fn(hidden_states[0]) |
There was a problem hiding this comment.
Incorrect tensor indexing. hidden_states is a tensor output from self.fc1(), not a list. Indexing with [0] would incorrectly extract only the first element along the first dimension rather than passing the full tensor to the activation function. This should be hidden_states = self.activation_fn(hidden_states) to properly apply the activation function to the entire tensor.
| hidden_states = self.activation_fn(hidden_states[0]) | |
| hidden_states = self.activation_fn(hidden_states) |
| int cur_idx_1 = base_split_idx_left + i; | ||
| int cur_idx_2 = base_split_idx_right + i; |
There was a problem hiding this comment.
The variables cur_idx_1 and cur_idx_2 are computed but never used. These lines should either be removed or the computed indices should be used for the Store operations if they serve a purpose.
| int cur_idx_1 = base_split_idx_left + i; | |
| int cur_idx_2 = base_split_idx_right + i; |
* [Feature] Optim PaddleOCR-VL * fix bug
Motivation
Optim PaddleOCR-VL Performence
Modifications
Usage or Command
Enabled by default, does not affect usage.
Accuracy Tests
Refer to op tests of this PR.
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.