blueprints/model-storage: review feedback fixes

Xe · Xe · commit 3b5b8ac0e533 · 2024-10-28T12:42:21.000-04:00
Signed-off-by: Xe Iaso &lt;xe@tigrisdata.com&gt;
diff --git a/docs/blueprints/model-storage/fly-io.md b/docs/blueprints/model-storage/fly-io.md
@@ -52,7 +52,7 @@ fly machine run \
   -e MODEL_PATH=ByteDance/SDXL-Lightning \
   --vm-gpu-kind l40s \
   -r sea \
-  your-docker-username/sdxl-tigris:latest \
+  ghcr.io/tigrisdata-community/runner/sdxl:latest \
   -- python -m cog.server.http --host ::
 ```
 
diff --git a/docs/blueprints/model-storage/index.mdx b/docs/blueprints/model-storage/index.mdx
@@ -6,61 +6,82 @@ state the underlying hardware is in. You don’t know if you have your models
 cached, and in the worst case you need to do a cold start and download your
 model weights from scratch.
 
-This is typically done with the Hugging Face CDN, but sometimes that’s just not
-fast enough, you don’t want to distribute your private model weights over a
-third party, or compliance issues force you to make sure that your model weights
-live and die in a single part of the globe. Remember, while your instances are
-sitting there pulling model weights from the cloud, you're burning GPU spend.
-Time is money.
+A couple fixable problems arise when running your models on serverless or any
+frequently changing infrastructure:
+
+- Model distribution that's not optimized for latency causes needless GPU idle
+  time as the model weights are downloaded to the machine on cold start. Tigris
+  behaves like a content delivery network by default and is designed for low
+  latency, saving idle time on cold start.
+- Compliance restrictions like data sovereignty and GDPR increase complexity
+  quickly. Tigris makes regional restrictions a one-line configuration, guide
+  [here](https://www.tigrisdata.com/docs/objects/object_regions/).
+- Reliance on third party caches for distributing models creates an upstream
+  dependency and leaves your system vulnerable to downtime. Tigris guarantees
+  99.99% availability with
+  [public availability data](https://www.tigrisdata.com/blog/availability-metrics-public/).
 
 ## Usecase
 
 You can put AI model weights into Tigris so that they are cached and fast no
 matter where you’re inferencing from. This allows you to have cold starts be
 faster and you can take advantage of Tigris'
-[globally distributed pull-through caching architecture](/docs/overview/),
-enabling your workloads to start quickly no matter where they are in the world.
-
-## Getting Started
+[globally distributed architecture](/docs/overview/), enabling your workloads to
+start quickly no matter where they are in the world.
 
 For this example, we’ll set up
 [SDXL Lightning](https://huggingface.co/ByteDance/SDXL-Lightning) by ByteDance
-for inference with the weights stored in Tigris.
-
-Create two new buckets:
-
-1. One bucket will be for generated images, it’ll be called `generated-images`
-   in this article
-2. One bucket will be for storing models, it’ll be called `model-storage` in
-   this article
+for inference with the weights stored in Tigris. Here's what you need to do:
 
-Both of these buckets should be private.
+- Prepare and upload the model to Tigris
+- Create a restricted access key for model runners
+- Run inference somewhere
 
 Download the `sdxl-in-tigris` template from GitHub:
 
 ```text
 git clone https://github.com/tigrisdata-community/sdxl-in-tigris
 ```
 
-Enter the folder in a terminal window.
+<details>
+<summary>Prerequisite tools</summary>
 
-If you have [Homebrew](https://brew.sh) installed on macOS or Linux, run
-`brew bundle` to automatically install all of the dependencies. If you don't,
-here's what you need to install via your package manager of choice:
+In order to run this example locally, you need these tools installed:
 
-- Python 3.11 (the minor version matters, the patch version can and will vary)
+- Python 3.11
 - pipenv
-- [Replicate's cog tool](https://github.com/replicate/cog)
 - The AWS CLI
-- The Hugging Face CLI
-- [jq](https://jqlang.github.io/jq/)
 
-Configure the AWS CLI for use with Tigris:
+Also be sure to configure the AWS CLI for use with Tigris:
 [Configuring the AWS CLI](/docs/sdks/s3/aws-cli/).
 
-If you are on a Mac, Install the
-[Docker Desktop app](https://www.docker.com/products/docker-desktop/). This will
-not work with alternatives such as Podman Desktop.
+To build a custom variant of the image, you need these tools installed:
+
+- Mac/Windows:
+  [Docker Desktop app](https://www.docker.com/products/docker-desktop/),
+  alternatives such as Podman Desktop will not work.
+- Linux: Docker daemon, alternatives such as Podman will not work.
+- [Replicate's cog tool](https://github.com/replicate/cog)
+- [jq](https://jqlang.github.io/jq/)
+
+To install all of the tool depedencies at once, clone the template repo and run
+`brew bundle`.
+
+</details>
+
+Create two new buckets:
+
+1. One bucket will be for generated images, it’ll be called `generated-images`
+   in this article
+2. One bucket will be for storing models, it’ll be called `model-storage` in
+   this article
+
+```text
+aws s3 create-bucket --acl private generated-images
+aws s3 create-bucket --acl private model-storage
+```
+
+Both of these buckets should be private.
 
 Then activate the virtual environment with `pipenv shell` and install the
 dependencies for uploading a model:
@@ -70,51 +91,69 @@ pipenv shell --python 3.11
 pip install -r requirements.txt
 ```
 
-Then run the script to upload a model:
+Then run the `prepare_model` script to massage and upload a Stable Diffusion XL
+model or finetune to Tigris:
 
 ```text
 python scripts/prepare_model.py ByteDance/SDXL-Lightning model-storage
 ```
 
-This will take a bit to run, depending on your internet connection speed, hard
-drive speed, and the current phase of the moon.
+:::info
+
+Want differently styled images? Try finetunes like
+[Kohaku XL](https://huggingface.co/KBlueLeaf/Kohaku-XL-Zeta)! Pass the Hugging
+Face repo name to the `prepare_model` script like this:
+
+```text
+python scripts/prepare_model.py KBlueLeaf/Kohaku-XL-Zeta model-storage
+```
 
-While it’s running, head to the Tigris console and create a new access key, give
-it the following permissions:
+:::
 
-- Read-only on your `model-storage` bucket
-- Editor on your `generated-images` bucket
+This will take a bit to run, depending on your internet connection speed, hard
+drive speed, and the current phase of the moon. While it’s running, head to the
+Tigris console and create a new access key. Don't assign any permissions to it.
+
+### Create a restricted access key for model runners
 
 Copy the access key ID and secret access keys into either your notes or a
 password manager, you will not be able to see them again. These credentials will
 be used later to deploy your app in the cloud. This keypair will be referred to
 as the `runner-keypair` in this tutorial.
 
-Once it’s done, you’ll have everything in Tigris and get a list of environment
-variables:
+Open `iam/model-runner.json` in your text editor. Change all references for
+`model-storage` and `generated-images` to the buckets you created earlier.
+
+Then export this variable to make IAM changes in Tigris:
 
 ```text
-  AWS_ACCESS_KEY_ID=<key from earlier>
-  AWS_SECRET_ACCESS_KEY=<key from earlier>
-  AWS_ENDPOINT_URL_S3=https://fly.storage.tigris.dev
-  AWS_REGION=auto
-  MODEL_BUCKET_NAME=model-storage
-  MODEL_PATH=ByteDance/SDXL-Lightning
+AWS_ENDPOINT_URL_IAM=https://fly.iam.storage.tigris.dev
 ```
 
-:::info
+Create an IAM policy based on the document you edited:
 
-Want differently styled images? Try finetunes like
-[Kohaku XL](https://huggingface.co/KBlueLeaf/Kohaku-XL-Zeta)! Pass the Hugging
-Face repo name to the `prepare_model` script like this:
+```text
+aws iam create-policy --policy-name sdxl-runner --policy-document file://./iam/model-runner.json
+```
+
+Copy down the ARN in the output, it should look something like this:
 
 ```text
-python scripts/prepare_model.py KBlueLeaf/Kohaku-XL-Zeta model-storage
+arn:aws:iam::flyio_hunter2hunter2hunter2:policy/sdxl-runner
 ```
 
-:::
+Attach it to the token you just created:
 
-## Deploying it
+```text
+aws iam attach-user-policy \
+  --policy-arn arn:aws:iam::flyio_hunter2hunter2:policy/sdxl-runner \
+  --user-name tid_runner_keypair_access_key_id
+```
+
+### Running inference
+
+<details>
+<summary>Optional: building your own image</summary>
 
 In order to deploy this, you need to build the image with the cog tool. Log into
 a Docker registry and run this command to build and push it:
@@ -123,8 +162,10 @@ a Docker registry and run this command to build and push it:
 cog push your-docker-username/sdxl-tigris --use-cuda-base-image false
 ```
 
-You can now use it with your GPU host of choice as long as it supports Cuda 12.1
-and has at least 12 GB of video memory.
+</details>
+
+You can now use it with your GPU host of choice as long as it support at least
+Cuda 12.1 and has at least 12 GB of video memory.
 
 This example is configured with environment variables. Set the following
 environment variables in your deployments:
diff --git a/docs/blueprints/model-storage/vast-ai.md b/docs/blueprints/model-storage/vast-ai.md
@@ -67,7 +67,7 @@ Then execute the launch command:
 ```text
 vastai create instance \
   <id-from-search> \
-  --image your-docker-username/flux-tigris:latest \
+  --image ghcr.io/tigrisdata-community/runner/sdxl:latest \
   --env "-p 5000:5000 -e AWS_ACCESS_KEY_ID=<runner-keypair-access-key-id> -e AWS_SECRET_ACCESS_KEY=<runner-keypair-secret-access-key> -e AWS_ENDPOINT_URL_S3=https://fly.storage.tigris.dev -e AWS_REGION=auto -e MODEL_BUCKET_NAME=model-storage -e MODEL_PATH=ByteDance/SDXL-Lightning -e PUBLIC_BUCKET_NAME=generated-images" \
   --disk 48 \
   --onstart-cmd "python -m cog.server.http"