Skip to content

Commit 4e6c13c

Browse files
fzyzcjylifuhuang
authored andcommitted
[PD] Allow customizing reserved tokens to avoid KV cache waste (sgl-project#6002)
1 parent 92dd9b6 commit 4e6c13c

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

python/sglang/srt/disaggregation/decode.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,9 @@ def __init__(
9797
self.tp_size = tp_size
9898
self.bootstrap_port = bootstrap_port
9999

100-
self.num_reserved_decode_tokens = 512
100+
self.num_reserved_decode_tokens = int(
101+
os.environ.get("SGLANG_NUM_RESERVED_DECODE_TOKENS", "512")
102+
)
101103

102104
# Queue for requests pending pre-allocation
103105
self.queue: List[DecodeRequest] = []

0 commit comments

Comments
 (0)