π¨ AI Studio β open-weight image + video + audio, chat-driven, on your own GPUs #460
Pinned
noonghunna
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
(supersedes the Image Studio beta (#348))
Back in the Image Studio beta we said "coming next: video and audio." It's here β and it's all one studio now. AI Studio turns the rig into a self-hosted creative studio β image, video, and audio generation behind Open WebUI, all open-weight, no cloud APIs, no keys. It runs as a
gpu-mode ai-studioscene (also bring-able from c3 β Operate).The flow is the same for everything:
The 12 lanes (pick one in the OWUI model picker):
What changed since the beta β one scene, lanes inside it (no more flipping
image-studio/video-studiogpu-modes); + video, + audio, + more image lanes (the beta was Ideogram-only); the coexisting chat model is gone β the studio is creative-only now (LLM serving is the separate core stack), so heavy modalities time-share the cards. This is not the "chat + image concurrent on 2 cards" model from the beta.Set it up β one command (fresh clone β generating):
bash scripts/setup-ai-studio.sh # build + download (~120 GB) + bring up + install the OWUI pipeThen open Open WebUI (
:8080), sign up (first account = admin), set the pipe'sbrowser_basevalve tohttp://<your-host>:8189, and pick a lane. Requirements + per-lane deep-dives: requirements Β· image Β· video Β· audio. Try: "a red fox trotting through autumn woods, slow-mo, cinematic."What we measured (2Γ RTX 3090) β Z-Image ~25 s/1024Β² Β· Krea 2 ~40 s/1024Β² Β· Ideogram ~70 s Β· HiDream-O1 ~3β4 min @ native 2048Β² Β· Wan2.2 480p ~2.5 min Β· 720p ~9 min (DisTorch both-card split). Video uses both 24 GB cards (the 22B/14B DiTs split via DisTorch); image/audio fit one card.
π What would help most
setup-ai-studio.shβ sign up β first render. Anything trip you up?Known caveats
COMFYUI_REF=origin/masterto float).Credits β ComfyUI (Comfy-Org) Β· Open WebUI Β· LTX-2.3 (Lightricks) Β· Sulphur / 10Eros (Anbeeld) Β· Wan2.2 (Alibaba) + Rapid-AllInOne (Phr00t) + GGUF requant (befox) Β· HiDream-O1 (HiDream-ai) + the
HiDream_O1-ComfyUInode (Saganaki22) Β· Ideogram-4 Β· Chroma Β· Z-Image (Alibaba Tongyi) Β· Krea 2 (Krea) Β· ACE-Step Β· Stable Audio (Stability) Β· Step-Audio-EditX (StepFun) Β· Kokoro Β· the uncensored director (HauhauCS) Β· ComfyUI-GGUF + DisTorch multi-GPU. Each open-weight model carries its own license (Apache / MIT / community β see the docs); the studio code is Apache-2.0. πBeta Was this translation helpful? Give feedback.
All reactions