|
| 1 | +# Using model weights in Tigris anywhere with Beam |
| 2 | + |
| 3 | +The most common way to deploy AI models in production is by using “serverless” |
| 4 | +inference. This means that every time you get a request, you don’t know what |
| 5 | +state the underlying hardware is in. You don’t know if you have your models |
| 6 | +cached, and in the worst case you need to do a cold start and download your |
| 7 | +model weights from scratch. |
| 8 | + |
| 9 | +A couple fixable problems arise when running your models on serverless or any |
| 10 | +frequently changing infrastructure: |
| 11 | + |
| 12 | +- Model distribution that's not optimized for latency causes needless GPU idle |
| 13 | + time as the model weights are downloaded to the machine on cold start. Tigris |
| 14 | + behaves like a content delivery network by default and is designed for low |
| 15 | + latency, saving idle time on cold start. |
| 16 | +- Compliance restrictions like data sovereignty and GDPR increase complexity |
| 17 | + quickly. Tigris makes regional restrictions a one-line configuration, guide |
| 18 | + [here](https://www.tigrisdata.com/docs/objects/object_regions/). |
| 19 | +- Reliance on third party caches for distributing models creates an upstream |
| 20 | + dependency and leaves your system vulnerable to downtime. Tigris guarantees |
| 21 | + 99.99% availability with |
| 22 | + [public availability data](https://www.tigrisdata.com/blog/availability-metrics-public/). |
| 23 | + |
| 24 | +## Beam |
| 25 | + |
| 26 | +Defining HTTP endpoints for AI things is annoyingly complicated. There's a lot |
| 27 | +of opinionated frameworks and layers that get in the way of you just running the |
| 28 | +bit of code you need to get your app working. [Beam](https://www.beam.cloud) is |
| 29 | +all about simplifying down the experence so that all you need to do to get an |
| 30 | +endpoint working is define a single function: |
| 31 | + |
| 32 | +```python |
| 33 | +from beam import endpoint, Image |
| 34 | + |
| 35 | + |
| 36 | +@endpoint( |
| 37 | + name="quickstart", |
| 38 | + cpu=1, |
| 39 | + memory="1Gi", |
| 40 | + image=Image().add_python_packages(["numpy"]), |
| 41 | +) |
| 42 | +def predict(**inputs): |
| 43 | + x = inputs.get("x", 256) |
| 44 | + return {"result": x**2} |
| 45 | +``` |
| 46 | + |
| 47 | +This lets you unify your code and configuration into the same file, allowing you |
| 48 | +to glance at a file and instantly understand what the endpoints are and do. They |
| 49 | +also [support GPU compute](https://docs.beam.cloud/v2/environment/gpu), allowing |
| 50 | +you to do scale-to-zero inference seamlessly. |
| 51 | + |
| 52 | +## Usecase |
| 53 | + |
| 54 | +You can put AI model weights into Tigris so that they are cached and fast no |
| 55 | +matter where you’re inferencing from. This allows you to have cold starts be |
| 56 | +faster and you can take advantage of Tigris' |
| 57 | +[globally distributed architecture](/docs/overview/), enabling your workloads to |
| 58 | +start quickly no matter where they are in the world. |
| 59 | + |
| 60 | +For this example, we’ll set up |
| 61 | +[SDXL Lightning](https://huggingface.co/ByteDance/SDXL-Lightning) by ByteDance |
| 62 | +for inference with the weights stored in Tigris. |
| 63 | + |
| 64 | +## Getting Started |
| 65 | + |
| 66 | +Download the `sdxl-in-tigris` template from GitHub: |
| 67 | + |
| 68 | +```text |
| 69 | +git clone https://github.com/tigrisdata-community/sdxl-in-tigris |
| 70 | +``` |
| 71 | + |
| 72 | +<details> |
| 73 | +<summary>Prerequisite tools</summary> |
| 74 | + |
| 75 | +In order to run this example locally, you need these tools installed: |
| 76 | + |
| 77 | +- Python 3.11 |
| 78 | +- pipenv |
| 79 | +- The AWS CLI |
| 80 | + |
| 81 | +Also be sure to configure the AWS CLI for use with Tigris: |
| 82 | +[Configuring the AWS CLI](/docs/sdks/s3/aws-cli/). |
| 83 | + |
| 84 | +To build a custom variant of the image, you need these tools installed: |
| 85 | + |
| 86 | +- Mac/Windows: |
| 87 | + [Docker Desktop app](https://www.docker.com/products/docker-desktop/), |
| 88 | + alternatives such as Podman Desktop will not work. |
| 89 | +- Linux: Docker daemon, alternatives such as Podman will not work. |
| 90 | +- [Replicate's cog tool](https://github.com/replicate/cog) |
| 91 | +- [jq](https://jqlang.github.io/jq/) |
| 92 | + |
| 93 | +To install all of the tool depedencies at once, clone the template repo and run |
| 94 | +`brew bundle`. |
| 95 | + |
| 96 | +</details> |
| 97 | + |
| 98 | +Create a new bucket for generated images, it’ll be called `generated-images` in |
| 99 | +this article. |
| 100 | + |
| 101 | +```text |
| 102 | +aws s3 create-bucket --acl private generated-images |
| 103 | +``` |
| 104 | + |
| 105 | +<details> |
| 106 | +<summary>Optional: upload your own model</summary> |
| 107 | + |
| 108 | +If you want to upload your own models, create a bucket for this. It'll be called |
| 109 | +`model-storage` in this tutorial. |
| 110 | + |
| 111 | +Both of these buckets should be private. |
| 112 | + |
| 113 | +Then activate the virtual environment with `pipenv shell` and install the |
| 114 | +dependencies for uploading a model: |
| 115 | + |
| 116 | +```text |
| 117 | +pipenv shell --python 3.11 |
| 118 | +pip install -r requirements.txt |
| 119 | +``` |
| 120 | + |
| 121 | +Run the `prepare_model` script to massage and upload a Stable Diffusion XL model |
| 122 | +or finetune to Tigris: |
| 123 | + |
| 124 | +```text |
| 125 | +python scripts/prepare_model.py ByteDance/SDXL-Lightning model-storage |
| 126 | +``` |
| 127 | + |
| 128 | +:::info |
| 129 | + |
| 130 | +Want differently styled images? Try finetunes like |
| 131 | +[Kohaku XL](https://huggingface.co/KBlueLeaf/Kohaku-XL-Zeta)! Pass the Hugging |
| 132 | +Face repo name to the `prepare_model` script like this: |
| 133 | + |
| 134 | +```text |
| 135 | +python scripts/prepare_model.py KBlueLeaf/Kohaku-XL-Zeta model-storage |
| 136 | +``` |
| 137 | + |
| 138 | +::: |
| 139 | + |
| 140 | +</details> |
| 141 | + |
| 142 | +## Access keys |
| 143 | + |
| 144 | +Create a new access key in the [Tigris Dashboard](https://console.tigris.dev). |
| 145 | +Don't assign any permissions to it. |
| 146 | + |
| 147 | +Copy the access key ID and secret access keys into either your notes or a |
| 148 | +password manager, you will not be able to see them again. These credentials will |
| 149 | +be used later to deploy your app in the cloud. This keypair will be referred to |
| 150 | +as the `workload-keypair` in this tutorial. |
| 151 | + |
| 152 | +[Limit the scope of this access key](/docs/blueprints/limited-access-key) to |
| 153 | +only the `model-storage-demo` (or a custom bucket if you're uploading your own |
| 154 | +models) and `generated-images` buckets. |
| 155 | + |
| 156 | +## Deploying it to Beam |
| 157 | + |
| 158 | +Install the Beam SDK and CLI into your python environment |
| 159 | +[according to their directions](https://docs.beam.cloud/v2/getting-started/installation). |
| 160 | +Be sure to run `beam config create` to authenticate with an API key. |
| 161 | + |
| 162 | +As a reminder, this example is configured with environment variables. Set the |
| 163 | +following secrets in your deployments: |
| 164 | + |
| 165 | +| Envvar name | Value | |
| 166 | +| ----------------------: | :----------------------------------------------------------------- | |
| 167 | +| `AWS_ACCESS_KEY_ID` | The access key ID from the workload keypair | |
| 168 | +| `AWS_SECRET_ACCESS_KEY` | The secret access key from the workload keypair | |
| 169 | +| `AWS_ENDPOINT_URL_S3` | `https://fly.storage.tigris.dev` | |
| 170 | +| `AWS_REGION` | `auto` | |
| 171 | +| `MODEL_PATH` | `ByteDance/SDXL-Lightning` | |
| 172 | +| `MODEL_BUCKET_NAME` | `model-storage-demo` (Optional: replace with your own bucket name) | |
| 173 | +| `PUBLIC_BUCKET_NAME` | `generated-images` (replace with your own bucket name) | |
| 174 | + |
| 175 | +You will need to run the `beam secret create` command for each of these: |
| 176 | + |
| 177 | +```text |
| 178 | +beam secret create AWS_ENDPOINT_URL_S3 https://fly.storage.tigris.dev |
| 179 | +``` |
| 180 | + |
| 181 | +Then deploy it with `beam deploy`: |
| 182 | + |
| 183 | +```text |
| 184 | +beam deploy beamcloud.py:generate |
| 185 | +``` |
| 186 | + |
| 187 | +You'll get a URL back that you can use to generate images. Do a test generation |
| 188 | +with this curl command: |
| 189 | + |
| 190 | +```text |
| 191 | +curl "https://url-you-were-given-v1.app.beam-cloud" \ |
| 192 | + -X PUT \ |
| 193 | + -H "Content-Type: application/json" \ |
| 194 | + -H 'Authorization: Bearer put-your-beam-auth-token-here' \ |
| 195 | + --data-binary '{ |
| 196 | + "prompt": "The space needle in Seattle, best quality, masterpiece", |
| 197 | + "aspect_ratio": "1:1", |
| 198 | + "guidance_scale": 3.5, |
| 199 | + "num_inference_steps": 4, |
| 200 | + "max_sequence_length": 512, |
| 201 | + "output_format": "png", |
| 202 | + "num_outputs": 1 |
| 203 | +}' |
| 204 | +``` |
| 205 | + |
| 206 | +If all goes well, you should get an image like this: |
| 207 | + |
| 208 | + |
| 209 | + |
| 210 | +Beam will automatically scale the deployment down when it's not in use. You can |
| 211 | +fully destroy your deployment with `beam deployment delete`: |
| 212 | + |
| 213 | +``` |
| 214 | +beam deployment list # to find the UUID of the deployment |
| 215 | +beam deployment delete uuid-of-deployment |
| 216 | +``` |
0 commit comments