feat: Support processing images by IronClaw#641
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the agent's capabilities by integrating multimodal image processing. It introduces new tools for image generation, editing, and analysis, along with the necessary backend infrastructure to handle image data and a revamped frontend to support user image attachments and display generated visuals. This allows the agent to interact with and create visual content, expanding its problem-solving domain. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces significant new functionality for image processing, including image generation, editing, and analysis. The changes are well-structured, touching on the agent core, channels, LLM providers, and UI. The introduction of new tools for image manipulation is a great addition.
I've made a few suggestions to improve code clarity, maintainability, and robustness, mainly around simplifying some logic and making string comparisons case-insensitive. Overall, this is a solid feature implementation.
Note: Security Review did not run due to the size of the PR.
End-to-end image handling: upload, generation, analysis, editing, and rendering across web gateway, HTTP webhook, WASM (Telegram/Slack), and REPL channels. Builds on the attachment infrastructure from #596 and draws inspiration from PR #641's image pipeline approach — credit to that PR's author for the sentinel JSON pattern and base64-in-JSON upload design. Key changes: - Image upload in web UI (file picker, paste, preview strip) - Image generation tool (FLUX/DALL-E via /v1/images/generations) - Image edit tool (multipart /v1/images/edits with fallback) - Image analysis tool (vision model for workspace images) - Model detection utilities (image_models.rs, vision_models.rs) - Sentinel JSON detection in dispatcher for generated image rendering - StatusUpdate::ImageGenerated → SSE/WS/REPL/WASM broadcast - HTTP webhook attachment support (base64, 5MB/file, 10MB total) - WASM channel image download (Telegram via file API, Slack via host HTTP) - Tool registration wiring in app.rs [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
End-to-end image handling: upload, generation, analysis, editing, and rendering across web gateway, HTTP webhook, WASM (Telegram/Slack), and REPL channels. Builds on the attachment infrastructure from #596 and draws inspiration from PR #641's image pipeline approach — credit to that PR's author for the sentinel JSON pattern and base64-in-JSON upload design. Key changes: - Image upload in web UI (file picker, paste, preview strip) - Image generation tool (FLUX/DALL-E via /v1/images/generations) - Image edit tool (multipart /v1/images/edits with fallback) - Image analysis tool (vision model for workspace images) - Model detection utilities (image_models.rs, vision_models.rs) - Sentinel JSON detection in dispatcher for generated image rendering - StatusUpdate::ImageGenerated → SSE/WS/REPL/WASM broadcast - HTTP webhook attachment support (base64, 5MB/file, 10MB total) - WASM channel image download (Telegram via file API, Slack via host HTTP) - Tool registration wiring in app.rs [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: full image support across all channels End-to-end image handling: upload, generation, analysis, editing, and rendering across web gateway, HTTP webhook, WASM (Telegram/Slack), and REPL channels. Builds on the attachment infrastructure from #596 and draws inspiration from PR #641's image pipeline approach — credit to that PR's author for the sentinel JSON pattern and base64-in-JSON upload design. Key changes: - Image upload in web UI (file picker, paste, preview strip) - Image generation tool (FLUX/DALL-E via /v1/images/generations) - Image edit tool (multipart /v1/images/edits with fallback) - Image analysis tool (vision model for workspace images) - Model detection utilities (image_models.rs, vision_models.rs) - Sentinel JSON detection in dispatcher for generated image rendering - StatusUpdate::ImageGenerated → SSE/WS/REPL/WASM broadcast - HTTP webhook attachment support (base64, 5MB/file, 10MB total) - WASM channel image download (Telegram via file API, Slack via host HTTP) - Tool registration wiring in app.rs [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR #725 review comments (16 issues) - SecretString for API keys in all image tools (image_gen, image_edit, image_analyze) - Binary image read via tokio::fs::read instead of DB-backed workspace.read() - Replace Arc<Workspace> with Option<PathBuf> base_dir (workspace has no filesystem API) - ApprovalRequirement::UnlessAutoApproved for cost-sensitive image tools - Scope sentinel detection to image_generate/image_edit tool names only - Skip ToolResult preview broadcast for image sentinels (avoids multi-MB base64 in SSE) - Extract shared media_type_from_path() to builtin/mod.rs - Rename fallback_chat_edit → fallback_generate with tracing::warn - Increase gateway body limit from 1MB to 10MB for image uploads - Increase webhook body limit to 15MB (base64 overhead) - Log warning on invalid base64 in images_to_attachments - Client-side image size limits (5MB/file, 5 images max) in app.js - aria-label on attach button for accessibility - Update body_too_large test for new 10MB limit [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add Slack file size check before download (PR review item #15) Skip downloading files larger than 20 MB in the Slack WASM channel to avoid excessive memory use and slow downloads in the WASM runtime. Logs a warning when a file is skipped. Also bumps channel versions for Slack and Telegram (prior branch changes). [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: cargo fmt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(security): add path validation and approval requirement to image tools Add sandbox path validation via validate_path() to both ImageAnalyzeTool and ImageEditTool to prevent path traversal attacks that could exfiltrate arbitrary files through external vision/edit APIs. Also fix ImageAnalyzeTool::requires_approval to return UnlessAutoApproved, consistent with ImageEditTool and ImageGenerateTool. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: post-download size guards and empty data_url sentinel check - Slack: add post-download size check on actual bytes when metadata size_bytes is absent, preventing bypass of the 20MB limit - Telegram: add 20MB download size limit (matching Slack) enforced in download_telegram_file() after receiving response bytes - Dispatcher: skip broadcasting ImageGenerated SSE event when data_url is empty from unwrap_or_default(), log warning instead Closes correctness issues #3, #4, #5 from PR #725 review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use mime_guess for media type detection, add alt attrs and media_type validation - Replace hardcoded media type mapping with mime_guess crate (already in deps) - Add alt attributes to img elements in web UI for accessibility - Validate media_type starts with "image/" in images_to_attachments() - Update bmp test assertion to match mime_guess behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Zaki <zaki@iqlusion.io>
|
Fully implemented via #725 (merged). Thanks @nickpismenkov for the original design — the sentinel JSON pattern and base64-in-JSON upload approach from this issue shaped the final implementation. What shipped in #725:
26 files changed, +1700 lines. All tests passing. 🎉 |
* feat: full image support across all channels End-to-end image handling: upload, generation, analysis, editing, and rendering across web gateway, HTTP webhook, WASM (Telegram/Slack), and REPL channels. Builds on the attachment infrastructure from nearai#596 and draws inspiration from PR nearai#641's image pipeline approach — credit to that PR's author for the sentinel JSON pattern and base64-in-JSON upload design. Key changes: - Image upload in web UI (file picker, paste, preview strip) - Image generation tool (FLUX/DALL-E via /v1/images/generations) - Image edit tool (multipart /v1/images/edits with fallback) - Image analysis tool (vision model for workspace images) - Model detection utilities (image_models.rs, vision_models.rs) - Sentinel JSON detection in dispatcher for generated image rendering - StatusUpdate::ImageGenerated → SSE/WS/REPL/WASM broadcast - HTTP webhook attachment support (base64, 5MB/file, 10MB total) - WASM channel image download (Telegram via file API, Slack via host HTTP) - Tool registration wiring in app.rs [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR nearai#725 review comments (16 issues) - SecretString for API keys in all image tools (image_gen, image_edit, image_analyze) - Binary image read via tokio::fs::read instead of DB-backed workspace.read() - Replace Arc<Workspace> with Option<PathBuf> base_dir (workspace has no filesystem API) - ApprovalRequirement::UnlessAutoApproved for cost-sensitive image tools - Scope sentinel detection to image_generate/image_edit tool names only - Skip ToolResult preview broadcast for image sentinels (avoids multi-MB base64 in SSE) - Extract shared media_type_from_path() to builtin/mod.rs - Rename fallback_chat_edit → fallback_generate with tracing::warn - Increase gateway body limit from 1MB to 10MB for image uploads - Increase webhook body limit to 15MB (base64 overhead) - Log warning on invalid base64 in images_to_attachments - Client-side image size limits (5MB/file, 5 images max) in app.js - aria-label on attach button for accessibility - Update body_too_large test for new 10MB limit [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add Slack file size check before download (PR review item nearai#15) Skip downloading files larger than 20 MB in the Slack WASM channel to avoid excessive memory use and slow downloads in the WASM runtime. Logs a warning when a file is skipped. Also bumps channel versions for Slack and Telegram (prior branch changes). [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: cargo fmt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(security): add path validation and approval requirement to image tools Add sandbox path validation via validate_path() to both ImageAnalyzeTool and ImageEditTool to prevent path traversal attacks that could exfiltrate arbitrary files through external vision/edit APIs. Also fix ImageAnalyzeTool::requires_approval to return UnlessAutoApproved, consistent with ImageEditTool and ImageGenerateTool. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: post-download size guards and empty data_url sentinel check - Slack: add post-download size check on actual bytes when metadata size_bytes is absent, preventing bypass of the 20MB limit - Telegram: add 20MB download size limit (matching Slack) enforced in download_telegram_file() after receiving response bytes - Dispatcher: skip broadcasting ImageGenerated SSE event when data_url is empty from unwrap_or_default(), log warning instead Closes correctness issues nearai#3, nearai#4, nearai#5 from PR nearai#725 review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use mime_guess for media type detection, add alt attrs and media_type validation - Replace hardcoded media type mapping with mime_guess crate (already in deps) - Add alt attributes to img elements in web UI for accessibility - Validate media_type starts with "image/" in images_to_attachments() - Update bmp test assertion to match mime_guess behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Zaki <zaki@iqlusion.io>
* feat: full image support across all channels End-to-end image handling: upload, generation, analysis, editing, and rendering across web gateway, HTTP webhook, WASM (Telegram/Slack), and REPL channels. Builds on the attachment infrastructure from nearai#596 and draws inspiration from PR nearai#641's image pipeline approach — credit to that PR's author for the sentinel JSON pattern and base64-in-JSON upload design. Key changes: - Image upload in web UI (file picker, paste, preview strip) - Image generation tool (FLUX/DALL-E via /v1/images/generations) - Image edit tool (multipart /v1/images/edits with fallback) - Image analysis tool (vision model for workspace images) - Model detection utilities (image_models.rs, vision_models.rs) - Sentinel JSON detection in dispatcher for generated image rendering - StatusUpdate::ImageGenerated → SSE/WS/REPL/WASM broadcast - HTTP webhook attachment support (base64, 5MB/file, 10MB total) - WASM channel image download (Telegram via file API, Slack via host HTTP) - Tool registration wiring in app.rs [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR nearai#725 review comments (16 issues) - SecretString for API keys in all image tools (image_gen, image_edit, image_analyze) - Binary image read via tokio::fs::read instead of DB-backed workspace.read() - Replace Arc<Workspace> with Option<PathBuf> base_dir (workspace has no filesystem API) - ApprovalRequirement::UnlessAutoApproved for cost-sensitive image tools - Scope sentinel detection to image_generate/image_edit tool names only - Skip ToolResult preview broadcast for image sentinels (avoids multi-MB base64 in SSE) - Extract shared media_type_from_path() to builtin/mod.rs - Rename fallback_chat_edit → fallback_generate with tracing::warn - Increase gateway body limit from 1MB to 10MB for image uploads - Increase webhook body limit to 15MB (base64 overhead) - Log warning on invalid base64 in images_to_attachments - Client-side image size limits (5MB/file, 5 images max) in app.js - aria-label on attach button for accessibility - Update body_too_large test for new 10MB limit [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add Slack file size check before download (PR review item nearai#15) Skip downloading files larger than 20 MB in the Slack WASM channel to avoid excessive memory use and slow downloads in the WASM runtime. Logs a warning when a file is skipped. Also bumps channel versions for Slack and Telegram (prior branch changes). [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: cargo fmt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(security): add path validation and approval requirement to image tools Add sandbox path validation via validate_path() to both ImageAnalyzeTool and ImageEditTool to prevent path traversal attacks that could exfiltrate arbitrary files through external vision/edit APIs. Also fix ImageAnalyzeTool::requires_approval to return UnlessAutoApproved, consistent with ImageEditTool and ImageGenerateTool. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: post-download size guards and empty data_url sentinel check - Slack: add post-download size check on actual bytes when metadata size_bytes is absent, preventing bypass of the 20MB limit - Telegram: add 20MB download size limit (matching Slack) enforced in download_telegram_file() after receiving response bytes - Dispatcher: skip broadcasting ImageGenerated SSE event when data_url is empty from unwrap_or_default(), log warning instead Closes correctness issues nearai#3, nearai#4, nearai#5 from PR nearai#725 review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use mime_guess for media type detection, add alt attrs and media_type validation - Replace hardcoded media type mapping with mime_guess crate (already in deps) - Add alt attributes to img elements in web UI for accessibility - Validate media_type starts with "image/" in images_to_attachments() - Update bmp test assertion to match mime_guess behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Zaki <zaki@iqlusion.io>
No description provided.