OmniParser: Screen Parsing tool for Pure Vision Based GUI Agent

OmniParser is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface.

Install

Install environment:

conda create -n "omni" python==3.12
conda activate omni
pip install -r requirements.txt

Then download the model ckpts files in: https://huggingface.co/microsoft/OmniParser, and put them under weights/, default folder structure is: weights/icon_detect, weights/icon_caption_florence, weights/icon_caption_blip2.

For v1: convert the safetensor to .pt file.

python weights/convert_safetensor_to_pt.py

For v1.5:
download 'model_v1_5.pt' from https://huggingface.co/microsoft/OmniParser/tree/main/icon_detect_v1_5, make a new dir: weights/icon_detect_v1_5, and put it inside the folder. No weight conversion is needed.

Examples:

We put together a few simple examples in the demo.ipynb.

Gradio Demo

To run gradio demo, simply run:

# For v1
python gradio_demo.py --icon_detect_model weights/icon_detect/best.pt --icon_caption_model florence2
# For v1.5
python gradio_demo.py --icon_detect_model weights/icon_detect_v1_5/model_v1_5.pt --icon_caption_model florence2

Model Weights License

For the model checkpoints on huggingface model hub, please note that icon_detect model is under AGPL license since it is a license inherited from the original yolo model. And icon_caption_blip2 & icon_caption_florence is under MIT license. Please refer to the LICENSE file in the folder of each model: https://huggingface.co/microsoft/OmniParser.

Important Notice

⚠️ This project is intended for research and educational purposes only.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
eval		eval
imgs		imgs
util		util
weights		weights
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
SECURITY.md		SECURITY.md
demo.ipynb		demo.ipynb
gradio_demo.py		gradio_demo.py
omniparser.py		omniparser.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OmniParser: Screen Parsing tool for Pure Vision Based GUI Agent

Install

Examples:

Gradio Demo

Model Weights License

Important Notice

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OmniParser: Screen Parsing tool for Pure Vision Based GUI Agent

Install

Examples:

Gradio Demo

Model Weights License

Important Notice

About

Resources

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages