Jetstream + RayServe deployment for interleave mode #146

richardsliu · 2024-07-10T23:19:16Z

No description provided.

FanhaiLu1

LGTM, thanks for adding this feature!

* kuberay manifests and dockerfile * sample ray_serve * Single host interleave * update image * Gcsfuse and jax platform fix * multihost * Cleanup * Cleanup * Parameterize tpu head type * Format * revert * revert * update readme * fix format * lint

andrewsykim · 2024-07-26T19:37:40Z

kuberay/manifests/ray-cluster.tpu-v4-multihost.yaml

+            - mountPath: /tmp/ray
+              name: ray-logs
+            name: ray-head
+            image: gcr.io/tpu-vm-gke-testing/ricliu-jetstream:20240709


This image is not publically available and references an internal project, can we host it on a public registry or provide the Dockerfile so users can build it themselves?

cc @ryanaoleary

Ah I see the Dockerfile now. I suggest not referencing private images though cause users will just apply the YAML without updating the image per the instructions

andrewsykim · 2024-07-26T19:38:29Z

kuberay/manifests/ray-cluster.tpu-v4-multihost.yaml

+              driver: gcsfuse.csi.storage.gke.io
+              readOnly: true
+              volumeAttributes:
+                bucketName: ricliu-llama2-70b-chat


This bucket is not publically available, can we host it in a public bucket or provide instructions to push the model weights to a bucket?

cc @ryanaoleary

andrewsykim · 2024-07-26T19:43:46Z

README.md

+
+kubectl port-forward svc/example-cluster-kuberay-head-svc 8265:8265 &
+
+ray job submit --runtime-env-json='{"working_dir": "."}' -- python run_ray_serve_interleave.py  --tpu_chips=4 --num_hosts=1 --size=7b --model_name=llama-2 --batch_size=32 --max_cache_length=2048 --tokenizer_path=/llama/tokenizer.model --checkpoint_path=/llama/ckpt --quantize_weights=True --quantize_type="int8_per_channel" --quantize_kv_cache=True --sharding_config="default_shardings/llama.yaml"


I don't think ray job submit was intended to be used with Ray Serve in this way. If you run it like this, the Ray Serve application will be treated as a Ray job and not survive a restart. Could we provide an example that uses the serve run CLI or the KubeRay RayService? cc @ryanaoleary

richardsliu and others added 20 commits June 12, 2024 23:50

kuberay manifests and dockerfile

3b23d20

sample ray_serve

20562e4

Single host interleave

673dbd9

Merge branch 'google:main' into main

f636ff7

Merge branch 'google:main' into main

5c9cb85

update image

5087d4b

Gcsfuse and jax platform fix

4188207

Merge branch 'google:main' into main

78a1af3

multihost

fe809d1

Cleanup

525b855

Cleanup

9057115

Parameterize tpu head type

7cd0cbb

Format

9233b9f

revert

b61f35f

Merge branch 'google:main' into main

6539490

revert

431fb86

Merge remote-tracking branch 'refs/remotes/origin/main'

7ea3a43

update readme

b37bd06

fix format

750233d

lint

434abf3

FanhaiLu1 approved these changes Jul 11, 2024

View reviewed changes

qihqi merged commit 663c102 into AI-Hypercomputer:main Jul 11, 2024
4 checks passed

andrewsykim reviewed Jul 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Jetstream + RayServe deployment for interleave mode #146

Jetstream + RayServe deployment for interleave mode #146

Uh oh!

richardsliu commented Jul 10, 2024

Uh oh!

FanhaiLu1 left a comment

Uh oh!

Uh oh!

andrewsykim Jul 26, 2024

Uh oh!

andrewsykim Jul 26, 2024

Uh oh!

andrewsykim Jul 26, 2024 •

edited

Loading

Uh oh!

andrewsykim Jul 26, 2024

Uh oh!

andrewsykim Jul 26, 2024

Uh oh!

andrewsykim Jul 26, 2024

Uh oh!

Uh oh!


		kubectl port-forward svc/example-cluster-kuberay-head-svc 8265:8265 &

		ray job submit --runtime-env-json='{"working_dir": "."}' -- python run_ray_serve_interleave.py --tpu_chips=4 --num_hosts=1 --size=7b --model_name=llama-2 --batch_size=32 --max_cache_length=2048 --tokenizer_path=/llama/tokenizer.model --checkpoint_path=/llama/ckpt --quantize_weights=True --quantize_type="int8_per_channel" --quantize_kv_cache=True --sharding_config="default_shardings/llama.yaml"

Jetstream + RayServe deployment for interleave mode #146

Jetstream + RayServe deployment for interleave mode #146

Uh oh!

Conversation

richardsliu commented Jul 10, 2024

Uh oh!

FanhaiLu1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andrewsykim Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

andrewsykim Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

andrewsykim Jul 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewsykim Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

andrewsykim Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

andrewsykim Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andrewsykim Jul 26, 2024 •

edited

Loading