Skip to content

Commit f4426c2

Browse files
authored
Add for readme interleave multiple host with ray (#114)
* add interleave multiple host with ray readme * add interleave multiple host with ray readme
1 parent fe328bb commit f4426c2

File tree

1 file changed

+35
-0
lines changed

1 file changed

+35
-0
lines changed

README.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,41 @@ Optional flags:
122122
* `--sharding_config=<path>` This makes use of alternative sharding config instead of
123123
the ones in default_shardings directory.
124124

125+
126+
# Run the server with ray
127+
Below are steps run server with ray:
128+
1. Ssh to Cloud Multiple Host TPU VM (v5e-16 TPU VM)
129+
2. Step 2 to step 5 in Outline
130+
3. Setup ray cluster
131+
4. Run server with ray
132+
133+
## Setup Ray Cluster
134+
Login host 0 VM, start ray head with below command:
135+
136+
```bash
137+
138+
ray start --head
139+
140+
```
141+
142+
Login other host VMs, start ray head with below command:
143+
144+
```bash
145+
146+
ray start --address='$ip:$port'
147+
148+
```
149+
150+
Note: Get address ip and port information from ray head.
151+
152+
## Run server with ray
153+
154+
Here is an example to run the server with ray for llama2 7B model:
155+
156+
```bash
157+
python run_server_with_ray.py --tpu_chips=16 -model_name=$model_name --size=7b --batch_size=96 --max_cache_length=2048 --quantize_weights=$quantize --quantize_type=$quantize_type --quantize_kv_cache=$quantize --checkpoint_path=$output_ckpt_dir --tokenizer_path=$tokenizer_path --sharding_config="default_shardings/llama.yaml"
158+
```
159+
125160
# Run benchmark
126161
Start the server and then go to the deps/JetStream folder (downloaded during `install_everything.sh`)
127162

0 commit comments

Comments
 (0)