🎨 Image Studio (beta) — local image generation + chat on your 3090s — testers wanted #348

noonghunna · 2026-06-09T05:48:11Z

noonghunna
Jun 9, 2026
Maintainer

🔼 Update (June 2026): superseded. Image Studio has grown into the full AI Studio — image + video + audio in one gpu-mode ai-studio scene (the coexisting chat model was dropped; modalities now time-share the cards). → New announcement: AI Studio. (This post's old docs/IMAGE_STUDIO.md guide moved to docs/ai-studio/.)

Original beta announcement below, kept for history.

We just merged Image Studio — a turnkey bundle that runs local text-to-image generation and an LLM chat in one browser UI, side by side on your own GPUs. No cloud, no API keys.

What it is

ComfyUI runs Ideogram-4 (fp8) for image gen.
Open WebUI is the front end — chat, plus a 🖼️ button that generates images via ComfyUI.
A small gemma-4-12b chat model is sized to coexist with image gen: on a 2-GPU box, images run on one card and chat on the other, at the same time.

One command

bash scripts/setup-image-studio.sh

It preflights (docker / GPU / disk / chat model), confirms, builds the ComfyUI image, downloads the Ideogram-4 set, and brings everything up via gpu-mode image-studio. Full guide: docs/ai-studio/ (architecture diagram, first-run steps, troubleshooting).

What we measured (2× RTX 3090)

1024² image: ~18.5 GB peak / ~70 s warm. 2048²: ~21.8 GB / ~5 min (batch 1).
Chat + image concurrent: ~18–22 GB on GPU 0 (image) + ~14 GB on GPU 1 (chat), sampled during a live generation.

🙏 Calling for beta testers — what would help most

Cross-rig numbers. 4090 / 5090 / other VRAM sizes — does it fit, how fast? (4090 has tighter idle VRAM; 5090's 32 GB envelope is untested for this.)
Single-GPU runs. The bundle assumes 2 GPUs for concurrent chat+image; on one GPU they're mutually exclusive. We'd love reports on the single-GPU fallback UX.
Fresh-install onboarding. Clone → setup-image-studio.sh → create your admin account → generate an image. Did anything trip you up? (Especially the Open WebUI image-gen wiring.)
Image quality / prompts. Ideogram-4 likes structured prompts — share what works.
Edge cases / OOM / boot failures — with report.sh --full output if you hit one.

Known caveats (call them out so you're not surprised)

Open WebUI's image-gen config auto-applies on a fresh data volume; if you reuse an existing Open WebUI volume, set it in Admin → Settings → Images (documented).
First run is heavy: a ~9 GB CUDA base + ~27 GB of weights. The preflight checks disk first.
ComfyUI is pinned to a known-good commit for reproducibility (COMFYUI_REF=HEAD to float).

Coming next
Video (HunyuanVideo-1.5 / Wan 2.2 / others) and audio (ACE-Step music, TTS) as video-studio / audio-studio modes — same coexist-on-your-cards idea.

Please drop feedback / numbers in this thread, or open a discussion with the numbers-from-your-rig template. Thanks for kicking the tyres! 🚀

Tobi-Adesoye · 2026-06-09T15:21:06Z

Tobi-Adesoye
Jun 9, 2026

This is a monumental release for multi-GPU consumer nodes, @noonghunna!

Testing cross-rig efficiency on dense configurations often exposes brutal VRAM fragmentation boundaries—especially when checking the single-GPU fallback UX or tighter profiles like an active-display 4090. When concurrent context passes occur, intermediate normalization weights and layer variances materialize back to global memory (HBM), causing unexpected allocation ceilings that trip standard execution paths.

As you look toward bringing in heavy text-to-video architectures like Wan 2.2 / HunyuanVideo-1.5 for your upcoming video-studio mode, this memory bottleneck will hit consumer envelopes even harder.

I've developed renorm-native to directly address this. It runs a Fused SRAM-Resident Layer Normalization Engine that hooks natively into standard PyTorch module layers. By keeping normalization math bound within local registers instead of global VRAM, it safeguards the execution matrix against fragmentation-induced OOMs during high-activation inference loops.

It also features an automated hardware-detection dispatcher layer—if it hits compilation fences or missing driver wrappers, it handles the exception gracefully and drops back to optimized native PyTorch tensor loops so the pipeline never crashes with unhandled execution errors.

I am actively preparing a set of validation runs targeting the single-GPU fallback constraints on consumer setups using renorm-native as an optimization wrapper for the ComfyUI execution block. I will drop full telemetry and matrix logs in a dedicated discussion thread once the benchmarking passes finish!

Repository Reference: https://github.com/Tobi-Adesoye/renorm-native

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🎨 Image Studio (beta) — local image generation + chat on your 3090s — testers wanted #348

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

🎨 Image Studio (beta) — local image generation + chat on your 3090s — testers wanted #348

Uh oh!

Uh oh!

noonghunna Jun 9, 2026 Maintainer

Replies: 1 comment

Uh oh!

Tobi-Adesoye Jun 9, 2026

noonghunna
Jun 9, 2026
Maintainer

Tobi-Adesoye
Jun 9, 2026