[FDConfig]Remove max_num_batched_tokens/max_num_seqs in parallel config by YuanRisheng · Pull Request #4116 · PaddlePaddle/FastDeploy

YuanRisheng · 2025-09-15T12:28:36Z

移除parallel_config和FDConfig中的max_num_batched_tokens和max_num_seqs，将其放入SchedulerConfig

paddle-bot · 2025-09-15T14:09:44Z

Thanks for your contribution!

Copilot

Pull Request Overview

This PR moves max_num_batched_tokens and max_num_seqs configuration parameters from ParallelConfig and FDConfig to a dedicated SchedulerConfig class. This refactoring improves the separation of concerns by placing scheduler-related configuration parameters in their appropriate config class.

Introduces SchedulerConfig import and instantiation across test files and worker components
Updates all references to use scheduler_config.max_num_batched_tokens and scheduler_config.max_num_seqs instead of parallel_config equivalents
Modifies config initialization in args_utils to properly pass parameters to SchedulerConfig

Reviewed Changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/v1/test_schedule_output.py	Updates test to use SchedulerConfig and new config structure
tests/v1/test_prefix_cache.py	Updates test configuration to use SchedulerConfig
tests/utils/test_config.py	Updates test assertions to access scheduler config properties
tests/utils.py	Moves max_num_seqs from parallel_config to scheduler_config
tests/graph_optimization/*.py	Updates multiple graph optimization tests to use SchedulerConfig
fastdeploy/worker/*.py	Updates worker components to access scheduler config properties
fastdeploy/spec_decode/base.py	Updates to use scheduler_config for max_num_seqs
fastdeploy/scheduler/config.py	Adds max_num_batched_tokens and max_num_seqs to SchedulerConfig
fastdeploy/model_executor/models/deepseek_v3.py	Updates buffer allocation to use scheduler config
fastdeploy/model_executor/layers/sample/sampler.py	Updates early stopper initialization
fastdeploy/model_executor/layers/backends/gcu/attention/*.py	Updates attention backends to use scheduler config
fastdeploy/model_executor/layers/attention/*.py	Updates attention components to use scheduler config
fastdeploy/model_executor/guided_decoding/xgrammar_backend.py	Updates batch size configuration
fastdeploy/engine/*.py	Updates engine components to use scheduler config
fastdeploy/config.py	Removes max_num_batched_tokens and max_num_seqs from ParallelConfig and FDConfig, adds SchedulerConfig integration

Copilot · 2025-09-16T11:36:59Z

fastdeploy/config.py

            self.long_prefill_token_threshold = int(self.max_model_len * 0.04)

-        self.cache_config.postprocess(self.max_num_batched_tokens, self.max_num_seqs)
+        self.cache_config.postprocess(self.scheduler_config.max_num_batched_tokens, self.max_num_seqs)


This should use self.scheduler_config.max_num_seqs instead of self.max_num_seqs for consistency and to ensure the scheduler config values are used throughout.

Suggested change

self.cache_config.postprocess(self.scheduler_config.max_num_batched_tokens, self.max_num_seqs)

self.cache_config.postprocess(self.scheduler_config.max_num_batched_tokens, self.scheduler_config.max_num_seqs)

YuanRisheng added 3 commits September 15, 2025 12:27

remove max_num_batched_tokens in parallel config

156db60

remove max_num_seqs

9cb21ad

update test case

48537d0

YuanRisheng changed the title ~~[FDConfig]Remove max_num_batched_tokens in parallel config~~ [FDConfig]Remove max_num_batched_tokens/max_num_seqs in parallel config Sep 15, 2025

resolve conflict

d925782

fix test

7e01b14

YuanRisheng added the skip-ci: coverage label Sep 16, 2025

Jiang-Jia-Jun requested a review from Copilot September 16, 2025 11:36

Copilot AI reviewed Sep 16, 2025

View reviewed changes

YuanRisheng and others added 2 commits September 16, 2025 11:45

fix

2ee2735

Merge branch 'develop' into remove_duplicate_field

2f54b8c

Jiang-Jia-Jun approved these changes Sep 17, 2025

View reviewed changes

Jiang-Jia-Jun merged commit 2e9e53f into PaddlePaddle:develop Sep 17, 2025
15 of 17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FDConfig]Remove max_num_batched_tokens/max_num_seqs in parallel config#4116

[FDConfig]Remove max_num_batched_tokens/max_num_seqs in parallel config#4116
Jiang-Jia-Jun merged 7 commits intoPaddlePaddle:developfrom
YuanRisheng:remove_duplicate_field

YuanRisheng commented Sep 15, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Sep 15, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Sep 16, 2025

Uh oh!

YuanRisheng Sep 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	self.cache_config.postprocess(self.scheduler_config.max_num_batched_tokens, self.max_num_seqs)
	self.cache_config.postprocess(self.scheduler_config.max_num_batched_tokens, self.scheduler_config.max_num_seqs)

Conversation

YuanRisheng commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot bot commented Sep 15, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

YuanRisheng Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

YuanRisheng commented Sep 15, 2025 •

edited

Loading