【Inference Optimize】Support MLA_CACHE & Fix V1_Schedule Bug#4318
Merged
chang-wenbin merged 2 commits intoPaddlePaddle:developfrom Oct 9, 2025
Merged
【Inference Optimize】Support MLA_CACHE & Fix V1_Schedule Bug#4318chang-wenbin merged 2 commits intoPaddlePaddle:developfrom
chang-wenbin merged 2 commits intoPaddlePaddle:developfrom
Conversation
Collaborator
chang-wenbin
commented
Sep 29, 2025
- Support cache initialization of MLA backend to rationalize the allocation of kvcache video memory, blocknum from 1500->4500, concurrency from 45->145.
- Fixed a bug in v1-schedule that caused the number of activated tokens to exceed max-num-batched-tokens.
|
Thanks for your contribution! |
zhoutianzi666
approved these changes
Sep 29, 2025
qingqing01
reviewed
Sep 29, 2025
| # To rationalize the allocation of kvcache. | ||
| from fastdeploy import envs | ||
|
|
||
| self.mla_cache = envs.FD_ATTENTION_BACKEND == "MLA_ATTN" |
Collaborator
There was a problem hiding this comment.
这里是用 MLA 的模型自动设置此环境变量,还是需要手动设置?
Collaborator
Author
There was a problem hiding this comment.
目前是启动脚本手动设置 export FD_ATTENTION_BACKEND="MLA_ATTN",
后面会根据config.json中的model_type 自动设置backend,这项修改计划和mla默认开启tensor_core一起提交。
gongshaotian
approved these changes
Oct 9, 2025
Collaborator
gongshaotian
left a comment
There was a problem hiding this comment.
KVCache 创建后续需要放到 Attention Backend 里处理
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.