You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/advanced_features/server_arguments.md
+1-3Lines changed: 1 addition & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -295,12 +295,10 @@ Please consult the documentation below and [server_args.py](https://github.com/s
295
295
## Ngram speculative decoding
296
296
| Argument | Description | Defaults | Options |
297
297
| --- | --- | --- | --- |
298
-
|`--speculative-ngram-min-match-window-size`| The minimum window size forpattern matchingin ngram speculative decoding. |`1`| Type: int |
299
-
|`--speculative-ngram-max-match-window-size`| The maximum window size forpattern matchingin ngram speculative decoding. |`12`| Type: int |
300
298
|`--speculative-ngram-min-bfs-breadth`| The minimum breadth forBFS (Breadth-First Search)in ngram speculative decoding. |`1`| Type: int |
301
299
|`--speculative-ngram-max-bfs-breadth`| The maximum breadth forBFS (Breadth-First Search)in ngram speculative decoding. |`10`| Type: int |
302
300
|`--speculative-ngram-match-type`| Ngram tree-building mode. `BFS` selects recency-based expansion and `PROB` selects frequency-based expansion. This setting is forwarded to the ngram cache implementation. |`BFS`|`BFS`, `PROB`|
303
-
|`--speculative-ngram-max-trie-depth`|The max trie depth for ngram speculative decoding. |`18`| Type: int |
301
+
|`--speculative-ngram-max-trie-depth`|Maximum suffix length stored and matched by the ngram trie. |`18`| Type: int |
304
302
|`--speculative-ngram-capacity`| The cache capacity for ngram speculative decoding. |`10000000`| Type: int |
Copy file name to clipboardExpand all lines: docs/advanced_features/speculative_decoding.md
+3-8Lines changed: 3 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -387,13 +387,11 @@ Enable it with:
387
387
388
388
| Parameter | Description | Default |
389
389
|---|---|---|
390
-
|`--speculative-num-draft-tokens`| Number of draft tokens verified per step. If omitted, defaults to `--speculative-ngram-max-match-window-size`. |`12` (with default ngram settings) |
|`--speculative-ngram-max-match-window-size`| Maximum matching window size. |`12`|
390
+
|`--speculative-num-draft-tokens`| Number of draft tokens verified per step. If omitted, defaults to `min(--speculative-ngram-max-trie-depth, 12)`. |`12` (with default ngram settings) |
|`--speculative-ngram-max-bfs-breadth`|`int`|`10`| Maximum BFS breadth |
471
466
|`--speculative-ngram-match-type`|`str`|`"BFS"`| Ngram tree-building mode: `"BFS"` for recency-based expansion or `"PROB"` for frequency-based expansion |
472
-
|`--speculative-ngram-max-trie-depth`|`int`|`18`|Max trie depth for ngram speculative decoding|
467
+
|`--speculative-ngram-max-trie-depth`|`int`|`18`|Maximum suffix length stored and matched by the ngram trie|
0 commit comments