File tree Expand file tree Collapse file tree 1 file changed +35
-0
lines changed Expand file tree Collapse file tree 1 file changed +35
-0
lines changed Original file line number Diff line number Diff line change @@ -122,6 +122,41 @@ Optional flags:
122122* ` --sharding_config=<path> ` This makes use of alternative sharding config instead of
123123 the ones in default_shardings directory.
124124
125+
126+ # Run the server with ray
127+ Below are steps run server with ray:
128+ 1 . Ssh to Cloud Multiple Host TPU VM (v5e-16 TPU VM)
129+ 2 . Step 2 to step 5 in Outline
130+ 3 . Setup ray cluster
131+ 4 . Run server with ray
132+
133+ ## Setup Ray Cluster
134+ Login host 0 VM, start ray head with below command:
135+
136+ ``` bash
137+
138+ ray start --head
139+
140+ ```
141+
142+ Login other host VMs, start ray head with below command:
143+
144+ ``` bash
145+
146+ ray start --address=' $ip:$port'
147+
148+ ```
149+
150+ Note: Get address ip and port information from ray head.
151+
152+ ## Run server with ray
153+
154+ Here is an example to run the server with ray for llama2 7B model:
155+
156+ ``` bash
157+ python run_server_with_ray.py --tpu_chips=16 -model_name=$model_name --size=7b --batch_size=96 --max_cache_length=2048 --quantize_weights=$quantize --quantize_type=$quantize_type --quantize_kv_cache=$quantize --checkpoint_path=$output_ckpt_dir --tokenizer_path=$tokenizer_path --sharding_config=" default_shardings/llama.yaml"
158+ ```
159+
125160# Run benchmark
126161Start the server and then go to the deps/JetStream folder (downloaded during ` install_everything.sh ` )
127162
You can’t perform that action at this time.
0 commit comments