File tree 1 file changed +35
-0
lines changed
1 file changed +35
-0
lines changed Original file line number Diff line number Diff line change @@ -122,6 +122,41 @@ Optional flags:
122
122
* ` --sharding_config=<path> ` This makes use of alternative sharding config instead of
123
123
the ones in default_shardings directory.
124
124
125
+
126
+ # Run the server with ray
127
+ Below are steps run server with ray:
128
+ 1 . Ssh to Cloud Multiple Host TPU VM (v5e-16 TPU VM)
129
+ 2 . Step 2 to step 5 in Outline
130
+ 3 . Setup ray cluster
131
+ 4 . Run server with ray
132
+
133
+ ## Setup Ray Cluster
134
+ Login host 0 VM, start ray head with below command:
135
+
136
+ ``` bash
137
+
138
+ ray start --head
139
+
140
+ ```
141
+
142
+ Login other host VMs, start ray head with below command:
143
+
144
+ ``` bash
145
+
146
+ ray start --address=' $ip:$port'
147
+
148
+ ```
149
+
150
+ Note: Get address ip and port information from ray head.
151
+
152
+ ## Run server with ray
153
+
154
+ Here is an example to run the server with ray for llama2 7B model:
155
+
156
+ ``` bash
157
+ python run_server_with_ray.py --tpu_chips=16 -model_name=$model_name --size=7b --batch_size=96 --max_cache_length=2048 --quantize_weights=$quantize --quantize_type=$quantize_type --quantize_kv_cache=$quantize --checkpoint_path=$output_ckpt_dir --tokenizer_path=$tokenizer_path --sharding_config=" default_shardings/llama.yaml"
158
+ ```
159
+
125
160
# Run benchmark
126
161
Start the server and then go to the deps/JetStream folder (downloaded during ` install_everything.sh ` )
127
162
You can’t perform that action at this time.
0 commit comments