An AI agent that automatically crawls web applications, handles authentication, and extracts comprehensive UI specifications for rebuilding applications with any tech stack.
β¨ Complete UI Extraction
- Buttons, forms, tables, icons, modals, and all interactive elements
- Component properties, positions, styling, and relationships
- Functional analysis (what each component does)
- Page structure, navigation, and hierarchy
- NEW: Iframe content extraction (embedded dashboards, widgets, KPIs)
π Authentication Support
- Handles login pages with configurable selectors
- Maintains session across crawled pages
- Works with most common authentication patterns
πΈ Visual Documentation
- Full-page screenshots for every page
- Component position mapping
- Visual hierarchy analysis
π Multiple Output Formats
- Clear text specifications (human-readable)
- JSON data (programmatic access)
- Organized by page with summary
π― NEW: Specification Converter
- Converts technical UI extractions to functional specifications
- Technology-agnostic requirements for any tech stack
- Generates feature breakdowns, data models, and user stories
- Perfect for developers, AI agents, and product managers
π Repository Structure:
ai_pixel_parse/- Main package codetests/- Test scripts and extraction runnerutilities/- Setup verification and utility scriptsdocs/- Comprehensive documentationextraction/- Output directory (created on first run)
π Documentation:
- Quick Start Guide - Get started in 5 minutes
- Usage Methods - Web UI vs Script comparison
- Extraction Guide - Detailed extraction info
- Specification Converter Guide - Convert to functional specs
- Iframe Extraction - Extract from embedded content
- Troubleshooting - Common issues & solutions
π§ Common Commands:
python3 utilities/verify_setup.py # Verify installation
python3 tests/run_extraction.py # Run UI extraction
python3 run_spec_converter.py # Convert to functional specs
python3 tests/test_agent_tool.py # Test agent/tool- Clone and setup environment:
cd ai-pixel-parse
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Install Playwright browsers:
playwright install chromiumStart the interactive web interface:
# Activate environment
source venv/bin/activate # or: venv\Scripts\activate on Windows
# Start ADK web UI
adk startThen chat with the agent:
"Extract UI from https://your-app.com and name it YourApp"
See ADK_WEB_UI_GUIDE.md for complete instructions.
Edit tests/run_extraction.py and configure your target application:
config = create_extraction_config(
base_url="https://your-app.com/dashboard",
app_name="YourApp",
login_url="https://your-app.com/login",
username="your-username",
password="your-password",
max_pages=30
)Then run:
python tests/run_extraction.pyimport asyncio
from ai_pixel_parse.ui_extractor_agent import create_extraction_config, extract_ui_specifications
async def main():
# Public app (no login)
config = create_extraction_config(
base_url="https://example.com",
app_name="ExampleApp",
max_pages=20
)
output_path = await extract_ui_specifications(config)
print(f"Specifications saved to: {output_path}")
asyncio.run(main())config = create_extraction_config(
base_url="https://app.example.com/dashboard",
app_name="MyApp",
login_url="https://app.example.com/login",
username="[email protected]",
password="demo123",
# Optional custom selectors if defaults don't work:
username_selector="input[name='email']",
password_selector="input[id='password']",
submit_selector="button.submit-btn",
max_pages=50
)
output_path = await extract_ui_specifications(config)| Parameter | Description | Default |
|---|---|---|
base_url |
Starting URL to crawl | Required |
app_name |
Name for output folder | Required |
login_url |
Login page URL | None |
username |
Login username | None |
password |
Login password | None |
username_selector |
CSS selector for username field | "input[name='username'], input[type='email']" |
password_selector |
CSS selector for password field | "input[name='password'], input[type='password']" |
submit_selector |
CSS selector for submit button | "button[type='submit'], input[type='submit']" |
max_pages |
Maximum pages to crawl | 50 |
After extraction, you'll find organized specifications in extraction/<app_name>_<timestamp>/:
extraction/
βββ YourApp_20250109_143022/ # App-specific timestamped folder
βββ 00_SUMMARY.txt # Overview & statistics
βββ 01_Dashboard.txt # Page 1 specification
βββ 02_User_Profile.txt # Page 2 specification
βββ README.md # Extraction summary
βββ screenshot_0.png # Page 1 screenshot
βββ screenshot_1.png # Page 2 screenshot
βββ full_extraction.json # Complete data in JSON
python3 tests/run_extraction.pyOutput: extraction/YourApp_<timestamp>/
python3 run_spec_converter.pyOr:
from ai_pixel_parse import convert_to_functional_spec_tool
specs = await convert_to_functional_spec_tool(
app_name="YourApp",
domain="e-commerce platform",
target_audience="AI agents and developers"
)Output: functional_specs/YourApp_spec_<timestamp>/
YourApp_functional_spec.md- Business requirementsYourApp_features.md- Feature breakdownYourApp_data_models.md- Data structuresYourApp_user_stories.md- User storiesREADME.md- Usage guide
Use the functional specifications to:
- Generate code with AI (Claude, GPT-4, etc.)
- Hand to development team
- Create architecture plans
- Generate test cases
See SPEC_CONVERTER_GUIDE.md for complete workflow details.
Each page specification includes:
- Page Metadata: Title, URL, timestamp, screenshot reference
- Page Structure: Header, main content, sidebar, footer layout
- Functions & Features: What the page does, user workflows
- Buttons: All clickable buttons with text, IDs, classes, positions
- Forms: Input fields, validation, submission endpoints
- Tables: Headers, data structure, sample rows
- Icons: SVG and icon fonts with positions
- Navigation: Links and menu structure
PAGE SPECIFICATION
================================================================================
Title: User Dashboard
URL: https://app.example.com/dashboard
Screenshot: screenshot_0.png
Extracted: 2025-11-09T14:30:22
FUNCTIONS & FEATURES
--------------------------------------------------------------------------------
[FORM_SUBMISSION]
Form submission: /api/users/search
Method: GET
Inputs: 2 fields
- Search Query: text
- Filter By: select
[DATA_DISPLAY]
Table display with 15 rows and 5 columns
Headers: Name, Email, Role, Status, Actions
BUTTONS (8)
--------------------------------------------------------------------------------
[Button] Create New User
ID: btn-create-user
Type: button
Classes: btn btn-primary
Position: x=1200, y=80, w=150, h=40
Aria Label: Create new user account
[Button] Export CSV
ID: btn-export
Type: button
Classes: btn btn-secondary
Position: x=1360, y=80, w=120, h=40
...
- UI Migration: Extract existing app UI to rebuild in new tech stack
- Documentation: Generate comprehensive UI documentation automatically
- Design System: Identify reusable components and patterns
- Quality Assurance: Catalog all UI elements for testing
- AI Training: Feed specifications to another agent to generate code
The output specifications are designed to be consumed by code generation agents:
# Read specifications
import json
with open('extraction/MyApp_20250109/full_extraction.json') as f:
specs = json.load(f)
# Feed to code generation agent
for page in specs:
# Agent can now generate React/Vue/Angular components
# based on buttons, forms, tables, etc.
passExtend UIExtractor.extract_page_components() to extract custom component types:
# In ui_extractor_agent.py, add to extract_page_components():
page_data['components']['custom_widgets'] = await page.evaluate("""
() => {
const widgets = [];
document.querySelectorAll('.custom-widget').forEach((widget, idx) => {
widgets.push({
id: widget.id || `widget_${idx}`,
type: widget.dataset.type,
config: JSON.parse(widget.dataset.config || '{}')
});
});
return widgets;
}
""")The agent captures screenshots. You can enhance it with vision models:
# Use Gemini Vision to analyze screenshots
from google.generativeai import GenerativeModel
model = GenerativeModel('gemini-2.0-flash-exp')
response = model.generate_content([
"Describe the visual hierarchy and design patterns in this UI",
{"mime_type": "image/png", "data": screenshot_data}
])Login not working?
- Inspect the login page and update selectors in config
- Check for CAPTCHA or MFA (may need manual handling)
- Verify credentials are correct
Missing components?
- Some components may load via JavaScript after page load
- Increase wait time: modify
wait_until='networkidle'timeout - Check browser console for errors
Too many/too few pages?
- Adjust
max_pagesparameter - Check that navigation links are being detected correctly
- Verify
base_urlmatches the domain structure
ADK Web UI not working?
- See TROUBLESHOOTING_WEB_UI.md for detailed ADK-specific troubleshooting
- Run
python3 test_agent_tool.pyto verify the tool works - Make sure agent has
functions=[extract_ui_tool]registered
- UIExtractor: Core class handling browser automation and extraction
- ui_extractor_agent: Google ADK Agent with extraction instructions
- Playwright: Headless browser for page interaction
- Output: Structured text files + JSON + screenshots
To add new extraction features:
- Add extraction logic to
UIExtractor.extract_page_components() - Update output formatting in
save_specifications() - Document the new capability in this README
[Add your license here]
For issues or questions:
- Check the troubleshooting section above
- Review example configurations in
tests/run_extraction.py - Run setup verification with
python utilities/verify_setup.py - Examine output specifications to verify extraction quality