Draft: Proper chunked prefill bucketing #295

kzawora-intel · 2025-07-16T14:41:00Z

No description provided.

Copilot

Pull Request Overview

This PR implements proper chunked prefill bucketing by modifying bucketing strategies to handle the new chunked prefill mode in engine version v1. The changes add support for chunked prefill scenarios with different filtering logic and bucket generation parameters.

Adds max_num_blocks parameter to bucket generation methods
Implements chunked prefill detection based on engine version
Introduces specialized filtering logic for chunked prefill scenarios

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
vllm_hpu_extension/bucketing/linear.py	Updates LinearBucketingStrategy to support chunked prefill with new filtering logic and parameter handling
vllm_hpu_extension/bucketing/exponential.py	Updates ExponentialBucketingStrategy with similar chunked prefill support and bucket range calculations
vllm_hpu_extension/bucketing/common.py	Passes the new max_num_blocks parameter to bucket generation methods

Comments suppressed due to low confidence (1)

vllm_hpu_extension/bucketing/linear.py:136

The parameter name is inconsistent between function signature and usage. The function parameter is named 'enable_chunked_prefill' but the passed variable is 'chunked_prefill'.

                            enable_chunked_prefill,

Copilot · 2025-07-17T08:10:55Z

vllm_hpu_extension/bucketing/linear.py

+                return is_seq_in_bounds and is_block_in_bounds
+            # Find the first bucket that exceeds max_model_len
+            # For each (bs, seq), keep all buckets that do not exceed model len, and the first that does
+            from collections import defaultdict


Import statements should be placed at the top of the file, not inside functions. Move this import to the module level.

Suggested change

from collections import defaultdict

Copilot · 2025-07-17T08:10:56Z

vllm_hpu_extension/bucketing/exponential.py

+                return is_seq_in_bounds and is_block_in_bounds
+            # Find the first bucket that exceeds max_model_len
+            # For each (bs, seq), keep all buckets that do not exceed model len, and the first that does
+            from collections import defaultdict


Import statements should be placed at the top of the file, not inside functions. Move this import to the module level.

Suggested change

from collections import defaultdict

Copilot · 2025-07-17T08:10:56Z

vllm_hpu_extension/bucketing/exponential.py

@@ -103,7 +111,7 @@ def generate_prompt_buckets(bs_bucket_config,
        for bs in batch_size_buckets:
            for b in seq_bucket_config:
                buckets_3d.append((bs, b, 0))
-                max_blocks_range = (bmax - b) // block_size
+                max_blocks_range = (bmax - b) // block_size if not max_num_blocks else max_num_blocks


[nitpick] The conditional logic is unclear. Consider using 'max_num_blocks if max_num_blocks is not None else (bmax - b) // block_size' to be more explicit about None checking.

Suggested change

max_blocks_range = (bmax - b) // block_size if not max_num_blocks else max_num_blocks

max_blocks_range = max_num_blocks if max_num_blocks is not None else (bmax - b) // block_size

Copilot · 2025-07-17T08:10:57Z

vllm_hpu_extension/bucketing/linear.py

+                    return True
+                else:
+                    return False
+            filtered_buckets = list(map(lambda x: x[1], filter(keep_bucket, enumerate(buckets))))


[nitpick] This complex nested lambda expression reduces readability. Consider using a list comprehension or separating into multiple steps for better clarity.

Suggested change

filtered_buckets = list(map(lambda x: x[1], filter(keep_bucket, enumerate(buckets))))

filtered_buckets = [bucket for _, bucket in enumerate(buckets) if keep_bucket((_, bucket))]

Copilot · 2025-07-17T08:10:57Z

vllm_hpu_extension/bucketing/exponential.py

+                    return True
+                else:
+                    return False
+            filtered_buckets = list(map(lambda x: x[1], filter(keep_bucket, enumerate(buckets))))


[nitpick] This complex nested lambda expression reduces readability. Consider using a list comprehension or separating into multiple steps for better clarity.

Suggested change

filtered_buckets = list(map(lambda x: x[1], filter(keep_bucket, enumerate(buckets))))

filtered_buckets = [bucket for _, bucket in enumerate(buckets) if keep_bucket((_, bucket))]

Proper chunked prefill bucketing

a6363aa

kzawora-intel requested review from madamczyk-intel, michalkuligowski, mgawarkiewicz-intel, tzielinski-habana, afierka-intel, xuechendi, jikunshang and mswiniarsk as code owners July 16, 2025 14:41

kzawora-intel marked this pull request as draft July 16, 2025 14:41

kzawora-intel mentioned this pull request Jul 16, 2025

Proper chunked prefill bucketing/warmup vllm-project/vllm-gaudi#32

Draft

madamczyk-intel requested a review from Copilot July 17, 2025 08:10

Copilot AI reviewed Jul 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Draft: Proper chunked prefill bucketing #295

Draft: Proper chunked prefill bucketing #295

Uh oh!

kzawora-intel commented Jul 16, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 17, 2025

Uh oh!

Copilot AI Jul 17, 2025

Uh oh!

Copilot AI Jul 17, 2025

Uh oh!

Copilot AI Jul 17, 2025

Uh oh!

Copilot AI Jul 17, 2025

Uh oh!

Uh oh!

	max_blocks_range = (bmax - b) // block_size if not max_num_blocks else max_num_blocks
	max_blocks_range = max_num_blocks if max_num_blocks is not None else (bmax - b) // block_size

	filtered_buckets = list(map(lambda x: x[1], filter(keep_bucket, enumerate(buckets))))
	filtered_buckets = [bucket for _, bucket in enumerate(buckets) if keep_bucket((_, bucket))]

Draft: Proper chunked prefill bucketing #295

Are you sure you want to change the base?

Draft: Proper chunked prefill bucketing #295

Uh oh!

Conversation

kzawora-intel commented Jul 16, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!