AI Pixel Parse - UI Extraction Agent

An AI agent that automatically crawls web applications, handles authentication, and extracts comprehensive UI specifications for rebuilding applications with any tech stack.

Features

✨ Complete UI Extraction

Buttons, forms, tables, icons, modals, and all interactive elements
Component properties, positions, styling, and relationships
Functional analysis (what each component does)
Page structure, navigation, and hierarchy
NEW: Iframe content extraction (embedded dashboards, widgets, KPIs)

🔐 Authentication Support

Handles login pages with configurable selectors
Maintains session across crawled pages
Works with most common authentication patterns

📸 Visual Documentation

Full-page screenshots for every page
Component position mapping
Visual hierarchy analysis

📄 Multiple Output Formats

Clear text specifications (human-readable)
JSON data (programmatic access)
Organized by page with summary

🎯 NEW: Specification Converter

Converts technical UI extractions to functional specifications
Technology-agnostic requirements for any tech stack
Generates feature breakdowns, data models, and user stories
Perfect for developers, AI agents, and product managers

Quick Links

📁 Repository Structure:

ai_pixel_parse/ - Main package code
tests/ - Test scripts and extraction runner
utilities/ - Setup verification and utility scripts
docs/ - Comprehensive documentation
extraction/ - Output directory (created on first run)

📚 Documentation:

Quick Start Guide - Get started in 5 minutes
Usage Methods - Web UI vs Script comparison
Extraction Guide - Detailed extraction info
Specification Converter Guide - Convert to functional specs
Iframe Extraction - Extract from embedded content
Troubleshooting - Common issues & solutions

🔧 Common Commands:

python3 utilities/verify_setup.py           # Verify installation
python3 tests/run_extraction.py             # Run UI extraction
python3 run_spec_converter.py               # Convert to functional specs
python3 tests/test_agent_tool.py            # Test agent/tool

Installation

Clone and setup environment:

cd ai-pixel-parse
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Install Playwright browsers:

playwright install chromium

Quick Start

Option 1: Use the ADK Web UI (Recommended) 🌐

Start the interactive web interface:

# Activate environment
source venv/bin/activate  # or: venv\Scripts\activate on Windows

# Start ADK web UI
adk start

Then chat with the agent:

"Extract UI from https://your-app.com and name it YourApp"

See ADK_WEB_UI_GUIDE.md for complete instructions.

Option 2: Use the runner script

Edit tests/run_extraction.py and configure your target application:

config = create_extraction_config(
    base_url="https://your-app.com/dashboard",
    app_name="YourApp",
    login_url="https://your-app.com/login",
    username="your-username",
    password="your-password",
    max_pages=30
)

Then run:

python tests/run_extraction.py

Option 2: Use as a Python module

import asyncio
from ai_pixel_parse.ui_extractor_agent import create_extraction_config, extract_ui_specifications

async def main():
    # Public app (no login)
    config = create_extraction_config(
        base_url="https://example.com",
        app_name="ExampleApp",
        max_pages=20
    )
    
    output_path = await extract_ui_specifications(config)
    print(f"Specifications saved to: {output_path}")

asyncio.run(main())

Option 3: With login authentication

config = create_extraction_config(
    base_url="https://app.example.com/dashboard",
    app_name="MyApp",
    login_url="https://app.example.com/login",
    username="[email protected]",
    password="demo123",
    # Optional custom selectors if defaults don't work:
    username_selector="input[name='email']",
    password_selector="input[id='password']",
    submit_selector="button.submit-btn",
    max_pages=50
)

output_path = await extract_ui_specifications(config)

Configuration Options

Parameter	Description	Default
`base_url`	Starting URL to crawl	Required
`app_name`	Name for output folder	Required
`login_url`	Login page URL	`None`
`username`	Login username	`None`
`password`	Login password	`None`
`username_selector`	CSS selector for username field	`"input[name='username'], input[type='email']"`
`password_selector`	CSS selector for password field	`"input[name='password'], input[type='password']"`
`submit_selector`	CSS selector for submit button	`"button[type='submit'], input[type='submit']"`
`max_pages`	Maximum pages to crawl	`50`

Output Structure

After extraction, you'll find organized specifications in extraction/<app_name>_<timestamp>/:

extraction/
└── YourApp_20250109_143022/        # App-specific timestamped folder
    ├── 00_SUMMARY.txt               # Overview & statistics
    ├── 01_Dashboard.txt             # Page 1 specification
    ├── 02_User_Profile.txt          # Page 2 specification
    ├── README.md                    # Extraction summary
    ├── screenshot_0.png             # Page 1 screenshot
    ├── screenshot_1.png             # Page 2 screenshot
    └── full_extraction.json         # Complete data in JSON

Complete Workflow: Extract → Convert → Build

1. Extract UI (Technical Details)

python3 tests/run_extraction.py

Output: extraction/YourApp_<timestamp>/

2. Convert to Functional Specs (Requirements)

python3 run_spec_converter.py

Or:

from ai_pixel_parse import convert_to_functional_spec_tool

specs = await convert_to_functional_spec_tool(
    app_name="YourApp",
    domain="e-commerce platform",
    target_audience="AI agents and developers"
)

Output: functional_specs/YourApp_spec_<timestamp>/

YourApp_functional_spec.md - Business requirements
YourApp_features.md - Feature breakdown
YourApp_data_models.md - Data structures
YourApp_user_stories.md - User stories
README.md - Usage guide

3. Build Application

Use the functional specifications to:

Generate code with AI (Claude, GPT-4, etc.)
Hand to development team
Create architecture plans
Generate test cases

See SPEC_CONVERTER_GUIDE.md for complete workflow details.

Specification File Format

Each page specification includes:

Page Metadata: Title, URL, timestamp, screenshot reference
Page Structure: Header, main content, sidebar, footer layout
Functions & Features: What the page does, user workflows
Buttons: All clickable buttons with text, IDs, classes, positions
Forms: Input fields, validation, submission endpoints
Tables: Headers, data structure, sample rows
Icons: SVG and icon fonts with positions
Navigation: Links and menu structure

Example Output

PAGE SPECIFICATION
================================================================================

Title: User Dashboard
URL: https://app.example.com/dashboard
Screenshot: screenshot_0.png
Extracted: 2025-11-09T14:30:22

FUNCTIONS & FEATURES
--------------------------------------------------------------------------------

[FORM_SUBMISSION]
Form submission: /api/users/search
  Method: GET
  Inputs: 2 fields
    - Search Query: text
    - Filter By: select

[DATA_DISPLAY]
Table display with 15 rows and 5 columns
  Headers: Name, Email, Role, Status, Actions

BUTTONS (8)
--------------------------------------------------------------------------------

[Button] Create New User
  ID: btn-create-user
  Type: button
  Classes: btn btn-primary
  Position: x=1200, y=80, w=150, h=40
  Aria Label: Create new user account

[Button] Export CSV
  ID: btn-export
  Type: button
  Classes: btn btn-secondary
  Position: x=1360, y=80, w=120, h=40
...

Use Cases

UI Migration: Extract existing app UI to rebuild in new tech stack
Documentation: Generate comprehensive UI documentation automatically
Design System: Identify reusable components and patterns
Quality Assurance: Catalog all UI elements for testing
AI Training: Feed specifications to another agent to generate code

Integration with Code Generation

The output specifications are designed to be consumed by code generation agents:

# Read specifications
import json

with open('extraction/MyApp_20250109/full_extraction.json') as f:
    specs = json.load(f)

# Feed to code generation agent
for page in specs:
    # Agent can now generate React/Vue/Angular components
    # based on buttons, forms, tables, etc.
    pass

Advanced Usage

Custom Component Extraction

Extend UIExtractor.extract_page_components() to extract custom component types:

# In ui_extractor_agent.py, add to extract_page_components():
page_data['components']['custom_widgets'] = await page.evaluate("""
    () => {
        const widgets = [];
        document.querySelectorAll('.custom-widget').forEach((widget, idx) => {
            widgets.push({
                id: widget.id || `widget_${idx}`,
                type: widget.dataset.type,
                config: JSON.parse(widget.dataset.config || '{}')
            });
        });
        return widgets;
    }
""")

Vision-Based Analysis

The agent captures screenshots. You can enhance it with vision models:

# Use Gemini Vision to analyze screenshots
from google.generativeai import GenerativeModel

model = GenerativeModel('gemini-2.0-flash-exp')
response = model.generate_content([
    "Describe the visual hierarchy and design patterns in this UI",
    {"mime_type": "image/png", "data": screenshot_data}
])

Troubleshooting

Login not working?

Inspect the login page and update selectors in config
Check for CAPTCHA or MFA (may need manual handling)
Verify credentials are correct

Missing components?

Some components may load via JavaScript after page load
Increase wait time: modify wait_until='networkidle' timeout
Check browser console for errors

Too many/too few pages?

Adjust max_pages parameter
Check that navigation links are being detected correctly
Verify base_url matches the domain structure

ADK Web UI not working?

See TROUBLESHOOTING_WEB_UI.md for detailed ADK-specific troubleshooting
Run python3 test_agent_tool.py to verify the tool works
Make sure agent has functions=[extract_ui_tool] registered

Architecture

UIExtractor: Core class handling browser automation and extraction
ui_extractor_agent: Google ADK Agent with extraction instructions
Playwright: Headless browser for page interaction
Output: Structured text files + JSON + screenshots

Contributing

To add new extraction features:

Add extraction logic to UIExtractor.extract_page_components()
Update output formatting in save_specifications()
Document the new capability in this README

License

[Add your license here]

Support

For issues or questions:

Check the troubleshooting section above
Review example configurations in tests/run_extraction.py
Run setup verification with python utilities/verify_setup.py
Examine output specifications to verify extraction quality

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Pixel Parse - UI Extraction Agent

Features

Quick Links

Installation

Quick Start

Option 1: Use the ADK Web UI (Recommended) 🌐

Option 2: Use the runner script

Option 2: Use as a Python module

Option 3: With login authentication

Configuration Options

Output Structure

Complete Workflow: Extract → Convert → Build

1. Extract UI (Technical Details)

2. Convert to Functional Specs (Requirements)

3. Build Application

Specification File Format

Example Output

Use Cases

Integration with Code Generation

Advanced Usage

Custom Component Extraction

Vision-Based Analysis

Troubleshooting

Architecture

Contributing

License

Support

About

Uh oh!

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github		.github
ai_pixel_parse		ai_pixel_parse
docs		docs
tests		tests
utilities		utilities
.gitignore		.gitignore
README.md		README.md
adk_config.py		adk_config.py
requirements.txt		requirements.txt

sajithrw/ai-pixel-parse

Folders and files

Latest commit

History

Repository files navigation

AI Pixel Parse - UI Extraction Agent

Features

Quick Links

Installation

Quick Start

Option 1: Use the ADK Web UI (Recommended) 🌐

Option 2: Use the runner script

Option 2: Use as a Python module

Option 3: With login authentication

Configuration Options

Output Structure

Complete Workflow: Extract → Convert → Build

1. Extract UI (Technical Details)

2. Convert to Functional Specs (Requirements)

3. Build Application

Specification File Format

Example Output

Use Cases

Integration with Code Generation

Advanced Usage

Custom Component Extraction

Vision-Based Analysis

Troubleshooting

Architecture

Contributing

License

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages