OminiControl: Minimal and Universal Control for Diffusion Transformer
Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, and Xinchao Wang
xML Lab, National University of Singapore
OminiControl2: Efficient Conditioning for Diffusion Transformers
Zhenxiong Tan, Qiaochu Xue, Xingyi Yang, Songhua Liu, and Xinchao Wang
xML Lab, National University of Singapore
OminiControl is a minimal yet powerful universal control framework for Diffusion Transformer models like FLUX.
-
Universal Control π: A unified control framework that supports both subject-driven control and spatial control (such as edge-guided and in-painting generation).
-
Minimal Design π: Injects control signals while preserving original model structure. Only introduces 0.1% additional parameters to the base model.
- 2025-05-12: βοΈ The code of OminiControl2 is released. It introduces a new efficient conditioning method for diffusion transformers. (Check out the training code here).
- 2025-05-12: Support custom style LoRA. (Check out the example).
- 2025-04-09: βοΈ OminiControl Art is released. It can stylize any image with a artistic style. (Check out the demo and inference examples).
- 2024-12-26: Training code are released. Now you can create your own OminiControl model by customizing any control tasks (3D, multi-view, pose-guided, try-on, etc.) with the FLUX model. Check the training folder for more details.
- Environment setup
conda create -n omini python=3.12
conda activate omini
- Requirements installation
pip install -r requirements.txt
- Subject-driven generation:
examples/subject.ipynb
- In-painting:
examples/inpainting.ipynb
- Canny edge to image, depth to image, colorization, deblurring:
examples/spatial.ipynb
- Input images are automatically center-cropped and resized to 512x512 resolution.
- When writing prompts, refer to the subject using phrases like
this item
,the object
, orit
. e.g.- A close up view of this item. It is placed on a wooden table.
- A young lady is wearing this shirt.
- The model primarily works with objects rather than human subjects currently, due to the absence of human data in training.
Demos (Left: condition image; Right: generated image)
Text Prompts
- Prompt1: A close up view of this item. It is placed on a wooden table. The background is a dark room, the TV is on, and the screen is showing a cooking show. With text on the screen that reads 'Omini Control!.'
- Prompt2: A film style shot. On the moon, this item drives across the moon surface. A flag on it reads 'Omini'. The background is that Earth looms large in the foreground.
- Prompt3: In a Bauhaus style room, this item is placed on a shiny glass table, with a vase of flowers next to it. In the afternoon sun, the shadows of the blinds are cast on the wall.
- Prompt4: "On the beach, a lady sits under a beach umbrella with 'Omini' written on it. She's wearing this shirt and has a big smile on her face, with her surfboard hehind her. The sun is setting in the background. The sky is a beautiful shade of orange and purple."
- Image Inpainting (Left: original image; Center: masked image; Right: filled image)
- Prompt: The Mona Lisa is wearing a white VR headset with 'Omini' written on it.
- Prompt: A yellow book with the word 'OMINI' in large font on the cover. The text 'for FLUX' appears at the bottom.
-
Other spatially aligned tasks (Canny edge to image, depth to image, colorization, deblurring)
Subject-driven control:
Model | Base model | Description | Resolution |
---|---|---|---|
experimental / subject |
FLUX.1-schnell | The model used in the paper. | (512, 512) |
omini / subject_512 |
FLUX.1-schnell | The model has been fine-tuned on a larger dataset. | (512, 512) |
omini / subject_1024 |
FLUX.1-schnell | The model has been fine-tuned on a larger dataset and accommodates higher resolution. | (1024, 1024) |
oye-cartoon |
FLUX.1-dev | The model has been fine-tuned on oye-cartoon dataset by @saquib764 | (512, 512) |
Spatial aligned control:
Model | Base model | Description | Resolution |
---|---|---|---|
experimental / <task_name> |
FLUX.1 | Canny edge to image, depth to image, colorization, deblurring, in-painting | (512, 512) |
- ComfyUI-Diffusers-OminiControl - ComfyUI integration by @Macoron
- ComfyUI_RH_OminiControl - ComfyUI integration by @HM-RunningHub
- The model's subject-driven generation primarily works with objects rather than human subjects due to the absence of human data in training.
- The subject-driven generation model may not work well with
FLUX.1-dev
. - The released model only supports the resolution of 512x512.
Training instructions can be found in this folder.
- Release the training code.
- Release the model for higher resolution (1024x1024).
@article{tan2024ominicontrol,
title={OminiControl: Minimal and Universal Control for Diffusion Transformer},
author={Tan, Zhenxiong and Liu, Songhua and Yang, Xingyi and Xue, Qiaochu and Wang, Xinchao},
journal={arXiv preprint arXiv:2411.15098},
year={2024}
}
@article{tan2025ominicontrol2,
title={OminiControl2: Efficient Conditioning for Diffusion Transformers},
author={Tan, Zhenxiong and Xue, Qiaochu and Yang, Xingyi and Liu, Songhua and Wang, Xinchao},
journal={arXiv preprint arXiv:2503.08280},
year={2025}
}