@@ -14,17 +14,18 @@ rendered properly in your Markdown viewer.
1414
1515-->
1616
17- # Mistral
18-
19- < div class = " flex flex-wrap space-x-1 " >
20- <img alt =" PyTorch " src =" https://img.shields.io/badge/PyTorch-DE3412 ?style=flat&logo=pytorch &logoColor=white " >
21- <img alt =" TensorFlow " src =" https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white " >
22- <img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=
23- ">
24- <img alt =" FlashAttention " src =" https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8 ?style=flat " >
25- < img alt = " SDPA " src = " https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white " >
17+ < div style = " float : right ; " >
18+ <div class="flex flex-wrap space-x-1">
19+ <img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white ">
20+ <img alt="TensorFlow " src="https://img.shields.io/badge/TensorFlow-FF6F00 ?style=flat&logo=tensorflow &logoColor=white">
21+ <img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=
22+ ">
23+ <img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat ">
24+ <img alt="SDPA " src="https://img.shields.io/badge/SDPA-DE3412 ?style=flat&logo=pytorch&logoColor=white ">
25+ </div >
2626</div >
2727
28+ # Mistral
2829
2930[ Mistral] ( https://huggingface.co/papers/2310.06825 ) is a 7B parameter language model, available as a pretrained and instruction-tuned variant, focused on balancing
3031the scaling costs of large models with performance and efficient inference. This model uses sliding window attention (SWA) trained with a 8K context length and a fixed cache size to handle longer sequences more effectively. Grouped-query attention (GQA) speeds up inference and reduces memory requirements. Mistral also features a byte-fallback BPE tokenizer to improve token handling and efficiency by ensuring characters are never mapped to out-of-vocabulary tokens.
@@ -49,39 +50,18 @@ The example below demonstrates how to chat with [`Pipeline`] or the [`AutoModel`
4950... {" role" : " user" , " content" : " Do you have mayonnaise recipes?" }
5051... ]
5152
52- chatbot = pipeline(" text-generation" , model = " mistralai/Mistral-7B-Instruct-v0.3" , torch_dtype = torch.bfloat16, device = 0 )
53- chatbot(messages)
54-
53+ >> > chatbot = pipeline(" text-generation" , model = " mistralai/Mistral-7B-Instruct-v0.3" , torch_dtype = torch.bfloat16, device = 0 )
54+ >> > chatbot(messages)
5555```
5656
57- <hfoptions id = " usage " >
57+ </ hfoption >
5858<hfoption id =" AutoModel " >
5959
60- The base model can be used as follows:
61-
62- ``` python
63- >> > from transformers import AutoModelForCausalLM, AutoTokenizer
64-
65- >> > model = AutoModelForCausalLM.from_pretrained(" mistralai/Mistral-7B-v0.3" , device_map = " auto" )
66- >> > tokenizer = AutoTokenizer.from_pretrained(" mistralai/Mistral-7B-v0.3" )
67-
68- >> > prompt = " My favourite condiment is"
69-
70- >> > model_inputs = tokenizer([prompt], return_tensors = " pt" ).to(" cuda" )
71- >> > model.to(device)
72-
73- >> > generated_ids = model.generate(** model_inputs, max_new_tokens = 100 , do_sample = True )
74- >> > tokenizer.batch_decode(generated_ids)[0 ]
75- " My favourite condiment is to ..."
76- ```
77-
78- The instruction tuned model can be used as follows:
79-
8060``` python
8161>> > import torch
8262>> > from transformers import AutoModelForCausalLM, AutoTokenizer
8363
84- >> > model = AutoModelForCausalLM.from_pretrained(" mistralai/Mistral-7B-v0.3" , torch_dtype = torch.bfloat16, attn_implementation = " sdpa" , device_map = " auto" )
64+ >> > model = AutoModelForCausalLM.from_pretrained(" mistralai/Mistral-7B-Instruct- v0.3" , torch_dtype = torch.bfloat16, attn_implementation = " sdpa" , device_map = " auto" )
8565>> > tokenizer = AutoTokenizer.from_pretrained(" mistralai/Mistral-7B-Instruct-v0.3" )
8666
8767>> > messages = [
@@ -96,16 +76,18 @@ The instruction tuned model can be used as follows:
9676>> > tokenizer.batch_decode(generated_ids)[0 ]
9777" Mayonnaise can be made as follows: (...)"
9878```
99- As can be seen, the instruction-tuned model requires a [ chat template] ( ../chat_templating ) to be applied to make sure the inputs are prepared in the right format.
10079
101- <hfoptions id = " usage " >
80+ </ hfoption >
10281<hfoption id =" transformers-cli " >
10382
10483``` python
105- # pip install -U flash-attn --no-build-isolation
106- echo - e " My favorite condiment is" | transformers- cli chat -- model mistralai/ Mistral- 7B - v0.3 -- torch_dtype auto -- device 0 -- attn_implementation flash_attention_2
84+ echo - e " My favorite condiment is" | transformers- cli chat -- model_name_or_path mistralai/ Mistral- 7B - v0.3 -- torch_dtype auto -- device 0 -- attn_implementation flash_attention_2
10785```
10886
87+ </hfoption >
88+ </hfoptions >
89+
90+
10991Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [ Quantization] ( ../quantization/overview ) overview for more available quantization backends.
11092
11193The example below uses [ bitsandbytes] ( ../quantization/bitsandbytes ) to only quantize the weights to 4-bits.
@@ -139,13 +121,16 @@ The example below uses [bitsandbytes](../quantization/bitsandbytes) to only quan
139121" The expected output"
140122```
141123
124+ </hfoption >
125+ </hfoptions >
126+
142127Use the [ AttentionMaskVisualizer] ( https://github.com/huggingface/transformers/blob/beb9b5b02246b9b7ee81ddf938f93f44cfeaad19/src/transformers/utils/attention_visualizer.py#L139 ) to better understand what tokens the model can and cannot attend to.
143128
144129``` py
145- from transformers.utils.attention_visualizer import AttentionMaskVisualizer
130+ >> > from transformers.utils.attention_visualizer import AttentionMaskVisualizer
146131
147- visualizer = AttentionMaskVisualizer(" mistralai/Mistral-7B-Instruct-v0.3" )
148- visualizer(" Do you have mayonnaise recipes?" )
132+ >> > visualizer = AttentionMaskVisualizer(" mistralai/Mistral-7B-Instruct-v0.3" )
133+ >> > visualizer(" Do you have mayonnaise recipes?" )
149134```
150135
151136<div class =" flex justify-center " >
0 commit comments