-
Notifications
You must be signed in to change notification settings - Fork 17
Jetstream + RayServe deployment for interleave mode #146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for adding this feature!
* kuberay manifests and dockerfile * sample ray_serve * Single host interleave * update image * Gcsfuse and jax platform fix * multihost * Cleanup * Cleanup * Parameterize tpu head type * Format * revert * revert * update readme * fix format * lint
- mountPath: /tmp/ray | ||
name: ray-logs | ||
name: ray-head | ||
image: gcr.io/tpu-vm-gke-testing/ricliu-jetstream:20240709 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This image is not publically available and references an internal project, can we host it on a public registry or provide the Dockerfile so users can build it themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @ryanaoleary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see the Dockerfile now. I suggest not referencing private images though cause users will just apply the YAML without updating the image per the instructions
driver: gcsfuse.csi.storage.gke.io | ||
readOnly: true | ||
volumeAttributes: | ||
bucketName: ricliu-llama2-70b-chat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bucket is not publically available, can we host it in a public bucket or provide instructions to push the model weights to a bucket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @ryanaoleary
|
||
kubectl port-forward svc/example-cluster-kuberay-head-svc 8265:8265 & | ||
|
||
ray job submit --runtime-env-json='{"working_dir": "."}' -- python run_ray_serve_interleave.py --tpu_chips=4 --num_hosts=1 --size=7b --model_name=llama-2 --batch_size=32 --max_cache_length=2048 --tokenizer_path=/llama/tokenizer.model --checkpoint_path=/llama/ckpt --quantize_weights=True --quantize_type="int8_per_channel" --quantize_kv_cache=True --sharding_config="default_shardings/llama.yaml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think ray job submit
was intended to be used with Ray Serve in this way. If you run it like this, the Ray Serve application will be treated as a Ray job and not survive a restart. Could we provide an example that uses the serve run
CLI or the KubeRay RayService? cc @ryanaoleary
No description provided.