|
| 1 | +# Summary |
| 2 | +This example demonstrates how to run [Deepseek R1 Distill Llama 8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) 3.8B model via ExecuTorch. |
| 3 | + |
| 4 | +# Instructions |
| 5 | +## Step 1: Setup |
| 6 | +1. Follow the [tutorial](https://pytorch.org/executorch/main/getting-started-setup) to set up ExecuTorch. For installation run `./install_executorch.sh --pybind xnnpack` |
| 7 | + |
| 8 | +2. Run the installation step for Llama specific requirements |
| 9 | +``` |
| 10 | +./examples/models/llama/install_requirements.sh |
| 11 | +``` |
| 12 | + |
| 13 | +## Step 2: Prepare and run the model |
| 14 | +1. Download the model |
| 15 | +``` |
| 16 | +pip install -U "huggingface_hub[cli]" |
| 17 | +huggingface-cli download deepseek-ai/DeepSeek-R1-Distill-Llama-8B --local-dir /target_dir/DeepSeek-R1-Distill-Llama-8B --local-dir-use-symlinks False |
| 18 | +``` |
| 19 | + |
| 20 | +2. Download the [tokenizer.model](https://huggingface.co/meta-llama/Llama-3.1-8B/blob/main/original/tokenizer.model) from the Llama3.1 repo which will be needed later on when running the model using the runtime. |
| 21 | + |
| 22 | +3. Convert the model to pth file. |
| 23 | +``` |
| 24 | +pip install torchtune |
| 25 | +``` |
| 26 | + |
| 27 | +Run this python code: |
| 28 | +``` |
| 29 | +from torchtune.models import convert_weights |
| 30 | +from torchtune.training import FullModelHFCheckpointer |
| 31 | +import torch |
| 32 | +
|
| 33 | +# Convert from safetensors to TorchTune. Suppose the model has been downloaded from Hugging Face |
| 34 | +checkpointer = FullModelHFCheckpointer( |
| 35 | + checkpoint_dir='/target_dir/DeepSeek-R1-Distill-Llama-8B ', |
| 36 | + checkpoint_files=['model-00001-of-000002.safetensors', 'model-00002-of-000002.safetensors'], |
| 37 | + output_dir='/tmp/deepseek-ai/DeepSeek-R1-Distill-Llama-8B/' , |
| 38 | + model_type='LLAMA3' # or other types that TorchTune supports |
| 39 | +) |
| 40 | +
|
| 41 | +print("loading checkpoint") |
| 42 | +sd = checkpointer.load_checkpoint() |
| 43 | +
|
| 44 | +# Convert from TorchTune to Meta (PyTorch native) |
| 45 | +sd = convert_weights.tune_to_meta(sd['model']) |
| 46 | +
|
| 47 | +print("saving checkpoint") |
| 48 | +torch.save(sd, "/tmp/deepseek-ai/DeepSeek-R1-Distill-Llama-8B/checkpoint.pth") |
| 49 | +``` |
| 50 | + |
| 51 | +4. Download and save the params.json file |
| 52 | +``` |
| 53 | +wget https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/original/params.json -o /tmp/params.json |
| 54 | +``` |
| 55 | + |
| 56 | +5. Generate a PTE file for use with the Llama runner. |
| 57 | +``` |
| 58 | +python -m examples.models.llama.export_llama \ |
| 59 | + --checkpoint /tmp/deepseek-ai/DeepSeek-R1-Distill-Llama-8B/checkpoint.pth \ |
| 60 | + -p /tmp/deepseek-ai/DeepSeek-R1-Distill-Llama-8B/params.json \ |
| 61 | + -kv \ |
| 62 | + --use_sdpa_with_kv_cache \ |
| 63 | + -X \ |
| 64 | + -qmode 8da4w \ |
| 65 | + --group_size 128 \ |
| 66 | + -d fp16 \ |
| 67 | + --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' \ |
| 68 | + --embedding-quantize 4,32 \ |
| 69 | + --output_name="DeepSeek-R1-Distill-Llama-8B.pte" |
| 70 | +``` |
| 71 | + |
| 72 | +6. Run the model on your desktop for validation or integrate with iOS/Android apps. Instructions for these are available in the Llama [README](../llama/README.md) starting at Step 3. |
0 commit comments