From 07e6bbcc16e617da74b13e96d53f6fcd109aac8a Mon Sep 17 00:00:00 2001 From: FanhaiLu1 Date: Tue, 4 Jun 2024 20:28:03 +0000 Subject: [PATCH 1/2] add interleave multiple host with ray readme --- README.md | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/README.md b/README.md index b28e5438..a7208668 100644 --- a/README.md +++ b/README.md @@ -122,6 +122,41 @@ Optional flags: * `--sharding_config=` This makes use of alternative sharding config instead of the ones in default_shardings directory. + +# Run the server with ray +Below are steps run server with ray: +1: Ssh to Cloud Multiple Host TPU VM (v5e-16 TPU VM) +2: Step 2 to step 5 in Outline +3: Setup ray cluster +4: Run server with ray + +## Setup Ray Cluster +Login host 0 VM, start ray head with below command + +```bash + +ray start --head + +``` + +Login other host VMs, start ray head with below command + +```bash + +ray start --address='$ip:$port' + +``` + +Note: Get address ip and port information from ray head + +## Run server with ray + +Here is an example to run the server with ray for llama2 7B model. + +```bash +python run_server_with_ray.py --tpu_chips=16 -model_name=$model_name --size=7b --batch_size=96 --max_cache_length=2048 --quantize_weights=$quantize --quantize_type=$quantize_type --quantize_kv_cache=$quantize --checkpoint_path=$output_ckpt_dir --tokenizer_path=$tokenizer_path --sharding_config="default_shardings/llama.yaml" +``` + # Run benchmark Start the server and then go to the deps/JetStream folder (downloaded during `install_everything.sh`) From e3814369f8b9e58a7628639c90be12733844731b Mon Sep 17 00:00:00 2001 From: FanhaiLu1 Date: Tue, 4 Jun 2024 20:31:01 +0000 Subject: [PATCH 2/2] add interleave multiple host with ray readme --- README.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index a7208668..bb7cd63c 100644 --- a/README.md +++ b/README.md @@ -125,13 +125,13 @@ Optional flags: # Run the server with ray Below are steps run server with ray: -1: Ssh to Cloud Multiple Host TPU VM (v5e-16 TPU VM) -2: Step 2 to step 5 in Outline -3: Setup ray cluster -4: Run server with ray +1. Ssh to Cloud Multiple Host TPU VM (v5e-16 TPU VM) +2. Step 2 to step 5 in Outline +3. Setup ray cluster +4. Run server with ray ## Setup Ray Cluster -Login host 0 VM, start ray head with below command +Login host 0 VM, start ray head with below command: ```bash @@ -139,7 +139,7 @@ ray start --head ``` -Login other host VMs, start ray head with below command +Login other host VMs, start ray head with below command: ```bash @@ -147,11 +147,11 @@ ray start --address='$ip:$port' ``` -Note: Get address ip and port information from ray head +Note: Get address ip and port information from ray head. ## Run server with ray -Here is an example to run the server with ray for llama2 7B model. +Here is an example to run the server with ray for llama2 7B model: ```bash python run_server_with_ray.py --tpu_chips=16 -model_name=$model_name --size=7b --batch_size=96 --max_cache_length=2048 --quantize_weights=$quantize --quantize_type=$quantize_type --quantize_kv_cache=$quantize --checkpoint_path=$output_ckpt_dir --tokenizer_path=$tokenizer_path --sharding_config="default_shardings/llama.yaml"