🎨 AI Studio — open-weight image + video + audio, chat-driven, on your own GPUs #460

noonghunna · 2026-06-24T03:57:31Z

noonghunna
Jun 24, 2026
Maintainer

(supersedes the Image Studio beta (#348))

Back in the Image Studio beta we said "coming next: video and audio." It's here — and it's all one studio now. AI Studio turns the rig into a self-hosted creative studio — image, video, and audio generation behind Open WebUI, all open-weight, no cloud APIs, no keys. It runs as a gpu-mode ai-studio scene (also bring-able from c3 → Operate).

📸 Sample renders coming shortly.

The flow is the same for everything:

type a casual idea → a "director" LLM crafts the real prompt → ComfyUI / a service renders → you get a gallery link → reply to refine.

The 12 lanes (pick one in the OWUI model picker):

Modality	Lanes
🎬 Video	LTX-2.3 (video + synced audio) · Sulphur / 10Eros / Wan2.2 (uncensored) — text→video, image→video, 60 s+ via auto-chaining
🖼️ Image	HiDream-O1 (top-quality / photoreal) · Ideogram-4 (design / logo / text) · Chroma / Z-Image (uncensored) · Krea 2 (aesthetic / stylized)
🎵 Audio	ACE-Step (music) · Stable Audio (SFX) · Step-Audio-EditX (premium voice clone) · Kokoro (narration ducked over a clip)

What changed since the beta — one scene, lanes inside it (no more flipping image-studio/video-studio gpu-modes); + video, + audio, + more image lanes (the beta was Ideogram-only); the coexisting chat model is gone — the studio is creative-only now (LLM serving is the separate core stack), so heavy modalities time-share the cards. This is not the "chat + image concurrent on 2 cards" model from the beta.

Set it up — one command (fresh clone → generating):

bash scripts/setup-ai-studio.sh        # build + download (~120 GB) + bring up + install the OWUI pipe

Then open Open WebUI (:8080), sign up (first account = admin), set the pipe's browser_base valve to http://<your-host>:8189, and pick a lane. Requirements + per-lane deep-dives: requirements · image · video · audio. Try: "a red fox trotting through autumn woods, slow-mo, cinematic."

What we measured (2× RTX 3090) — Z-Image ~25 s/1024² · Krea 2 ~40 s/1024² · Ideogram ~70 s · HiDream-O1 ~3–4 min @ native 2048² · Wan2.2 480p ~2.5 min · 720p ~9 min (DisTorch both-card split). Video uses both 24 GB cards (the 22B/14B DiTs split via DisTorch); image/audio fit one card.

🙏 What would help most

Cross-rig numbers — 4090 / 5090 / other VRAM: does it fit, how fast?
Single-GPU runs — image + audio work on one card; video wants two. How's the single-card subset UX?
Fresh-install onboarding — clone → setup-ai-studio.sh → sign up → first render. Anything trip you up?
Render quality / prompts per lane, and which model / modality you'd want next.

Known caveats

Video needs two 24 GB cards (single-card = image + audio + short/low-res video).
Not real-time — renders are seconds-to-minutes; modalities are sequential, not simultaneous.
First run is heavy (~9 GB CUDA base + ~120 GB weights; the preflight checks disk).
ComfyUI is pinned to a known-good commit for reproducibility (COMFYUI_REF=origin/master to float).
Uncensored where the model allows — capability lives in the open weights, the infra is content-neutral. Please use it responsibly and within your local laws.

Credits — ComfyUI (Comfy-Org) · Open WebUI · LTX-2.3 (Lightricks) · Sulphur / 10Eros (Anbeeld) · Wan2.2 (Alibaba) + Rapid-AllInOne (Phr00t) + GGUF requant (befox) · HiDream-O1 (HiDream-ai) + the HiDream_O1-ComfyUI node (Saganaki22) · Ideogram-4 · Chroma · Z-Image (Alibaba Tongyi) · Krea 2 (Krea) · ACE-Step · Stable Audio (Stability) · Step-Audio-EditX (StepFun) · Kokoro · the uncensored director (HauhauCS) · ComfyUI-GGUF + DisTorch multi-GPU. Each open-weight model carries its own license (Apache / MIT / community — see the docs); the studio code is Apache-2.0. 🙏

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🎨 AI Studio — open-weight image + video + audio, chat-driven, on your own GPUs #460

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

🎨 AI Studio — open-weight image + video + audio, chat-driven, on your own GPUs #460

Uh oh!

Uh oh!

noonghunna Jun 24, 2026 Maintainer

Replies: 0 comments

noonghunna
Jun 24, 2026
Maintainer