Skip to content

Jetstream + RayServe deployment for interleave mode #146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jul 11, 2024
Merged

Jetstream + RayServe deployment for interleave mode #146

merged 20 commits into from
Jul 11, 2024

Conversation

richardsliu
Copy link
Collaborator

No description provided.

Copy link
Collaborator

@FanhaiLu1 FanhaiLu1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for adding this feature!

@qihqi qihqi merged commit 663c102 into AI-Hypercomputer:main Jul 11, 2024
4 checks passed
wang2yn84 pushed a commit that referenced this pull request Jul 18, 2024
* kuberay manifests and dockerfile

* sample ray_serve

* Single host interleave

* update image

* Gcsfuse and jax platform fix

* multihost

* Cleanup

* Cleanup

* Parameterize tpu head type

* Format

* revert

* revert

* update readme

* fix format

* lint
- mountPath: /tmp/ray
name: ray-logs
name: ray-head
image: gcr.io/tpu-vm-gke-testing/ricliu-jetstream:20240709

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This image is not publically available and references an internal project, can we host it on a public registry or provide the Dockerfile so users can build it themselves?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@andrewsykim andrewsykim Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see the Dockerfile now. I suggest not referencing private images though cause users will just apply the YAML without updating the image per the instructions

driver: gcsfuse.csi.storage.gke.io
readOnly: true
volumeAttributes:
bucketName: ricliu-llama2-70b-chat

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bucket is not publically available, can we host it in a public bucket or provide instructions to push the model weights to a bucket?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


kubectl port-forward svc/example-cluster-kuberay-head-svc 8265:8265 &

ray job submit --runtime-env-json='{"working_dir": "."}' -- python run_ray_serve_interleave.py --tpu_chips=4 --num_hosts=1 --size=7b --model_name=llama-2 --batch_size=32 --max_cache_length=2048 --tokenizer_path=/llama/tokenizer.model --checkpoint_path=/llama/ckpt --quantize_weights=True --quantize_type="int8_per_channel" --quantize_kv_cache=True --sharding_config="default_shardings/llama.yaml"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think ray job submit was intended to be used with Ray Serve in this way. If you run it like this, the Ray Serve application will be treated as a Ray job and not survive a restart. Could we provide an example that uses the serve run CLI or the KubeRay RayService? cc @ryanaoleary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants