Skip to content

Commit 5af248b

Browse files
authored
[generate] remove docs of a feature that no longer exists (#40895)
1 parent 20ee3a7 commit 5af248b

File tree

1 file changed

+0
-30
lines changed

1 file changed

+0
-30
lines changed

docs/source/en/llm_optims.md

Lines changed: 0 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -183,36 +183,6 @@ text
183183
'My favorite all time favorite condiment is ketchup. I love it on everything. I love it on my eggs, my fries, my chicken, my burgers, my hot dogs, my sandwiches, my salads, my p']
184184
```
185185

186-
</hfoption>
187-
<hfoption id="3. compile entire generate function">
188-
189-
Compiling the entire [`~GenerationMixin.generate`] function also compiles the input preparation logit processor operations, and more, in addition to the forward pass. With this approach, you don't need to initialize [`StaticCache`] or set the [cache_implementation](https://hf.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig.cache_implementation) parameter.
190-
191-
```py
192-
from transformers import AutoTokenizer, AutoModelForCausalLM
193-
import torch
194-
import os
195-
os.environ["TOKENIZERS_PARALLELISM"] = "false" # To prevent long warnings :)
196-
197-
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
198-
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", dtype="auto", device_map="auto")
199-
200-
model.generate = torch.compile(model.generate, mode="reduce-overhead", fullgraph=True)
201-
input_text = "The theory of special relativity states "
202-
input_ids = tokenizer(input_text, return_tensors="pt").to(model.device.type)
203-
204-
outputs = model.generate(**input_ids)
205-
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
206-
['The theory of special relativity states 1. The speed of light is constant in all inertial reference']
207-
```
208-
209-
This usage pattern is more appropriate for unique hardware or use cases, but there are several drawbacks to consider.
210-
211-
1. Compilation is much slower.
212-
2. Parameters must be configured through [`GenerationConfig`].
213-
3. Many warnings and exceptions are suppressed. We recommend testing the uncompiled model first.
214-
4. Many features are unavailable at the moment. For example, generation does not stop if an `EOS` token is selected.
215-
216186
</hfoption>
217187
</hfoptions>
218188

0 commit comments

Comments
 (0)