From 07e6bbcc16e617da74b13e96d53f6fcd109aac8a Mon Sep 17 00:00:00 2001
From: FanhaiLu1 <fanhai@google.com>
Date: Tue, 4 Jun 2024 20:28:03 +0000
Subject: [PATCH 1/2] add interleave multiple host with ray readme

---
 README.md | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)
diff --git a/README.md b/README.md
index b28e5438..a7208668 100644
--- a/README.md
+++ b/README.md
@@ -122,6 +122,41 @@ Optional flags:
 * `--sharding_config=<path>` This makes use of alternative sharding config instead of
   the ones in default_shardings directory.
 
+
+# Run the server with ray
+Below are steps run server with ray:
+1: Ssh to Cloud Multiple Host TPU VM (v5e-16 TPU VM)
+2: Step 2 to step 5 in Outline 
+3: Setup ray cluster 
+4: Run server with ray
+
+## Setup Ray Cluster 
+Login host 0 VM, start ray head with below command 
+
+```bash
+
+ray start --head
+
+```
+
+Login other host VMs, start ray head with below command
+
+```bash
+
+ray start --address='$ip:$port'
+
+```
+
+Note: Get address ip and port information from ray head
+
+## Run server with ray
+
+Here is an example to run the server with ray for llama2 7B model.
+
+```bash
+python run_server_with_ray.py --tpu_chips=16 -model_name=$model_name --size=7b --batch_size=96 --max_cache_length=2048 --quantize_weights=$quantize --quantize_type=$quantize_type --quantize_kv_cache=$quantize --checkpoint_path=$output_ckpt_dir   --tokenizer_path=$tokenizer_path --sharding_config="default_shardings/llama.yaml"
+```
+
 # Run benchmark
 Start the server and then go to the deps/JetStream folder (downloaded during `install_everything.sh`)
 

From e3814369f8b9e58a7628639c90be12733844731b Mon Sep 17 00:00:00 2001
From: FanhaiLu1 <fanhai@google.com>
Date: Tue, 4 Jun 2024 20:31:01 +0000
Subject: [PATCH 2/2] add interleave multiple host with ray readme

---
 README.md | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/README.md b/README.md
index a7208668..bb7cd63c 100644
--- a/README.md
+++ b/README.md
@@ -125,13 +125,13 @@ Optional flags:
 
 # Run the server with ray
 Below are steps run server with ray:
-1: Ssh to Cloud Multiple Host TPU VM (v5e-16 TPU VM)
-2: Step 2 to step 5 in Outline 
-3: Setup ray cluster 
-4: Run server with ray
+1. Ssh to Cloud Multiple Host TPU VM (v5e-16 TPU VM)
+2. Step 2 to step 5 in Outline 
+3. Setup ray cluster 
+4. Run server with ray
 
 ## Setup Ray Cluster 
-Login host 0 VM, start ray head with below command 
+Login host 0 VM, start ray head with below command: 
 
 ```bash
 
@@ -139,7 +139,7 @@ ray start --head
 
 ```
 
-Login other host VMs, start ray head with below command
+Login other host VMs, start ray head with below command:
 
 ```bash
 
@@ -147,11 +147,11 @@ ray start --address='$ip:$port'
 
 ```
 
-Note: Get address ip and port information from ray head
+Note: Get address ip and port information from ray head.
 
 ## Run server with ray
 
-Here is an example to run the server with ray for llama2 7B model.
+Here is an example to run the server with ray for llama2 7B model:
 
 ```bash
 python run_server_with_ray.py --tpu_chips=16 -model_name=$model_name --size=7b --batch_size=96 --max_cache_length=2048 --quantize_weights=$quantize --quantize_type=$quantize_type --quantize_kv_cache=$quantize --checkpoint_path=$output_ckpt_dir   --tokenizer_path=$tokenizer_path --sharding_config="default_shardings/llama.yaml"