@@ -78,18 +78,18 @@ mistralai/Mixtral-8x7B-Instruct-v0.1
78
78
To run jetstream-pytorch server with one model:
79
79
80
80
```
81
- jpt serve --model_id --model_id meta-llama/Meta-Llama-3-8B-Instruct
81
+ jpt serve --model_id meta-llama/Meta-Llama-3-8B-Instruct
82
82
```
83
83
84
- If it the first time you run this model, it will download weights from
84
+ If it's the first time you run this model, it will download weights from
85
85
HuggingFace.
86
86
87
87
HuggingFace's Llama3 weights are gated, so you need to either run
88
88
` huggingface-cli login ` to set your token, OR, pass your hf_token explicitly.
89
89
90
- To pass hf token, add ` --hf_token ` flag
90
+ To pass hf token explicitly , add ` --hf_token ` flag
91
91
```
92
- jpt serve --model_id --model_id meta-llama/Meta-Llama-3-8B-Instruct --hf_token=...
92
+ jpt serve --model_id meta-llama/Meta-Llama-3-8B-Instruct --hf_token=...
93
93
```
94
94
95
95
To login using huggingface hub, run:
@@ -109,6 +109,13 @@ Quantization will be done on the flight as the weight loads.
109
109
Weights downloaded from HuggingFace will be stored by default in ` checkpoints ` folder.
110
110
in the place where ` jpt ` is executed.
111
111
112
+ You can change where the weights are stored with ` --working_dir ` flag.
113
+
114
+ If you wish to use your own checkpoint, then, place them inside
115
+ of the ` checkpoints/<org>/<model>/hf_original ` dir (or the corresponding subdir in ` --working_dir ` ). For example,
116
+ Llama3 checkpoints will be at ` checkpoints/meta-llama/Llama-2-7b-hf/hf_original/*.safetensors ` . You can replace these files with modified
117
+ weights in HuggingFace format.
118
+
112
119
113
120
# Run the server with ray
114
121
Below are steps run server with ray:
0 commit comments