Skip to content

Commit 973c5a1

Browse files
committed
Adding instructions on how to run the Deepseek R1 Distill Llama 8B model
1 parent 88886b8 commit 973c5a1

File tree

1 file changed

+72
-0
lines changed
  • examples/models/deepseek-r1-distill-llama-8B

1 file changed

+72
-0
lines changed
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# Summary
2+
This example demonstrates how to run [Deepseek R1 Distill Llama 8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) 3.8B model via ExecuTorch.
3+
4+
# Instructions
5+
## Step 1: Setup
6+
1. Follow the [tutorial](https://pytorch.org/executorch/main/getting-started-setup) to set up ExecuTorch. For installation run `./install_executorch.sh --pybind xnnpack`
7+
8+
2. Run the installation step for Llama specific requirements
9+
```
10+
./examples/models/llama/install_requirements.sh
11+
```
12+
13+
## Step 2: Prepare and run the model
14+
1. Download the model
15+
```
16+
pip install -U "huggingface_hub[cli]"
17+
huggingface-cli download deepseek-ai/DeepSeek-R1-Distill-Llama-8B --local-dir /target_dir/DeepSeek-R1-Distill-Llama-8B --local-dir-use-symlinks False
18+
```
19+
20+
2. Download the [tokenizer.model](https://huggingface.co/meta-llama/Llama-3.1-8B/blob/main/original/tokenizer.model) from the Llama3.1 repo which will be needed later on when running the model using the runtime.
21+
22+
3. Convert the model to pth file.
23+
```
24+
pip install torchtune
25+
```
26+
27+
Run this python code:
28+
```
29+
from torchtune.models import convert_weights
30+
from torchtune.training import FullModelHFCheckpointer
31+
import torch
32+
33+
# Convert from safetensors to TorchTune. Suppose the model has been downloaded from Hugging Face
34+
checkpointer = FullModelHFCheckpointer(
35+
checkpoint_dir='/target_dir/DeepSeek-R1-Distill-Llama-8B ',
36+
checkpoint_files=['model-00001-of-000002.safetensors', 'model-00002-of-000002.safetensors'],
37+
output_dir='/tmp/deepseek-ai/DeepSeek-R1-Distill-Llama-8B/' ,
38+
model_type='LLAMA3' # or other types that TorchTune supports
39+
)
40+
41+
print("loading checkpoint")
42+
sd = checkpointer.load_checkpoint()
43+
44+
# Convert from TorchTune to Meta (PyTorch native)
45+
sd = convert_weights.tune_to_meta(sd['model'])
46+
47+
print("saving checkpoint")
48+
torch.save(sd, "/tmp/deepseek-ai/DeepSeek-R1-Distill-Llama-8B/checkpoint.pth")
49+
```
50+
51+
4. Download and save the params.json file
52+
```
53+
wget https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/original/params.json -o /tmp/params.json
54+
```
55+
56+
5. Generate a PTE file for use with the Llama runner.
57+
```
58+
python -m examples.models.llama.export_llama \
59+
--checkpoint /tmp/deepseek-ai/DeepSeek-R1-Distill-Llama-8B/checkpoint.pth \
60+
-p /tmp/deepseek-ai/DeepSeek-R1-Distill-Llama-8B/params.json \
61+
-kv \
62+
--use_sdpa_with_kv_cache \
63+
-X \
64+
-qmode 8da4w \
65+
--group_size 128 \
66+
-d fp16 \
67+
--metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' \
68+
--embedding-quantize 4,32 \
69+
--output_name="DeepSeek-R1-Distill-Llama-8B.pte"
70+
```
71+
72+
6. Run the model on your desktop for validation or integrate with iOS/Android apps. Instructions for these are available in the Llama [README](../llama/README.md) starting at Step 3.

0 commit comments

Comments
 (0)