feat: Support processing images by IronClaw by nickpismenkov · Pull Request #641 · nearai/ironclaw

nickpismenkov · 2026-03-07T01:55:56Z

No description provided.

gemini-code-assist · 2026-03-07T01:56:32Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the agent's capabilities by integrating multimodal image processing. It introduces new tools for image generation, editing, and analysis, along with the necessary backend infrastructure to handle image data and a revamped frontend to support user image attachments and display generated visuals. This allows the agent to interact with and create visual content, expanding its problem-solving domain.

Highlights

Multimodal Image Support: Introduced comprehensive support for image processing, allowing users to attach images to messages and enabling the agent to generate, edit, and analyze images.
New Image Tools: Added three new built-in tools: image_generate for creating images from text prompts, image_edit for modifying existing images, and image_analyze for leveraging LLM vision capabilities to understand image content. These tools integrate with the NEAR AI cloud-api (FLUX model) for generation and editing.
Frontend Enhancements: The web user interface now supports attaching images via file selection or paste, displays previews of staged images, and renders generated images directly within the chat feed.
Backend Infrastructure for Images: Updated core data structures like ChatMessage and IncomingMessage to include ImageAttachments. The dispatcher now detects image_generated sentinel events from tool outputs and broadcasts them via SSE, and the rig adapter handles multi-modal message conversion for LLMs.
Dynamic Tool Registration: The application now dynamically registers image generation, editing, and vision analysis tools based on the availability of compatible LLM models, ensuring that only relevant tools are exposed.

Changelog

.claude/settings.local.json
- Added permissions for Bash commands (cargo check, clippy, test, fmt, grep, env) and Skill(ship).
src/agent/dispatcher.rs
- Added logic to detect image_generated sentinel in tool output and emit a StatusUpdate::ImageGenerated SSE event.
src/agent/session.rs
- Imported ImageAttachment from llm module.
- Added a images field to the Turn struct to store attached images.
- Implemented start_turn_with_images method to create a new turn with user input and image attachments.
- Modified the messages method to construct ChatMessage::user_with_images when images are present in a turn.
src/agent/thread_ops.rs
- Updated the start_turn logic to use thread.start_turn_with_images if the incoming message contains image attachments.
src/app.rs
- Implemented dynamic registration of image generation and editing tools if compatible LLM models are detected.
- Implemented dynamic registration of vision analysis tools if vision-capable LLM models are detected.
src/channels/channel.rs
- Imported ImageAttachment from llm module.
- Added an images field to the IncomingMessage struct to hold attached images.
- Added a with_images builder method to IncomingMessage for attaching images.
src/channels/repl.rs
- Added handling for StatusUpdate::ImageGenerated to print a console message indicating an image was generated.
src/channels/wasm/wrapper.rs
- Added conversion logic for StatusUpdate::ImageGenerated to the WIT channel status update.
src/channels/web/mod.rs
- Added SseEvent::ImageGenerated to broadcast image generation events to web clients.
src/channels/web/openai_compat.rs
- Initialized the images field to an empty vector during OpenAiMessage to ChatMessage conversion.
src/channels/web/server.rs
- Imported ImageAttachment for use in web server logic.
- Converted ImageData from incoming requests into ImageAttachment and attached them to IncomingMessage.
src/channels/web/sse.rs
- Added image_generated as a recognized event type for Server-Sent Events (SSE).
src/channels/web/static/app.js
- Added stagedImages array to manage image attachments in the frontend.
- Implemented event listeners for file input and paste events to handle image uploads.
- Added renderImagePreviews and removeImage functions to manage and display image previews.
- Modified sendMessage to include staged images in the API request body.
- Added an event listener for image_generated SSE events to display generated images in the chat.
src/channels/web/static/index.html
- Added a hidden file input and an 'Attach image' button to the chat input area.
- Included an image-preview-strip div to display previews of attached images.
src/channels/web/static/style.css
- Added CSS styles for .generated-image-card, .generated-image, and .generated-image-path to display generated images.
- Added CSS styles for the .attach-btn, .image-preview-strip, .image-preview, and .image-preview-remove for image attachment functionality.
src/channels/web/types.rs
- Defined a new ImageData struct for base64-encoded image data.
- Added an images field to SendMessageRequest and WsClientMessage::Message to support image attachments.
- Added SseEvent::ImageGenerated enum variant for broadcasting image generation events.
src/channels/web/ws.rs
- Imported ImageAttachment for WebSocket message processing.
- Updated handle_client_message to convert ImageData from WebSocket messages into ImageAttachment and include them in IncomingMessage.
src/llm/image_models.rs
- Added a new module image_models to detect and suggest image generation models (e.g., FLUX, DALL-E, Stable Diffusion).
src/llm/mod.rs
- Added image_models and vision_models modules to the LLM module exports.
- Exported ImageAttachment from the provider module.
src/llm/nearai_chat.rs
- Changed ChatCompletionMessage's content field type to Option<serde_json::Value> to support multi-modal content.
- Updated flatten_tool_messages to correctly handle serde_json::Value content when processing assistant and tool messages.
- Modified the From<ChatMessage> implementation to construct multi-part content for user messages with image attachments, formatting them as image_url objects.
src/llm/provider.rs
- Defined a new ImageAttachment struct to represent image data with media type and base64 content.
- Added an images field to the ChatMessage struct to store image attachments.
- Added a user_with_images constructor for ChatMessage to easily create user messages with attached images.
src/llm/rig_adapter.rs
- Imported rig's Image, ImageMediaType, and DocumentSourceKind for multi-modal message handling.
- Modified convert_messages to handle ChatMessages with image attachments by creating multi-part RigMessage::User content, converting ImageAttachments to rig::message::Image.
src/llm/vision_models.rs
- Added a new module vision_models to detect and suggest vision-capable LLM models (e.g., Claude, GPT-4, Gemini).
src/tools/builtin/image_analyze.rs
- Added a new tool ImageAnalyzeTool for analyzing images using vision-capable LLMs. It reads images from the workspace and prepares them for analysis.
src/tools/builtin/image_edit.rs
- Added a new tool ImageEditTool for editing existing images via the NEAR AI cloud-api (FLUX model). It handles reading source images, sending edit requests, and saving the result.
src/tools/builtin/image_gen.rs
- Added a new tool ImageGenerateTool for generating images from text prompts using the NEAR AI cloud-api (FLUX model). It handles prompt submission and saving generated images.
src/tools/builtin/mod.rs
- Exported the newly added ImageAnalyzeTool, ImageEditTool, and ImageGenerateTool.
src/tools/registry.rs
- Imported the new image-related tools (ImageEditTool, ImageGenerateTool).
- Added image_generate, image_edit, and image_analyze to the list of PROTECTED_TOOL_NAMES.
- Implemented register_image_tools to register image generation and editing tools with the provided NEAR AI configuration and workspace.
- Implemented register_vision_tools to register the image analysis tool with the workspace.
tests/e2e_routine_heartbeat.rs
- Updated IncomingMessage instantiations in tests to include the new images field, initialized as an empty vector.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces significant new functionality for image processing, including image generation, editing, and analysis. The changes are well-structured, touching on the agent core, channels, LLM providers, and UI. The introduction of new tools for image manipulation is a great addition.

I've made a few suggestions to improve code clarity, maintainability, and robustness, mainly around simplifying some logic and making string comparisons case-insensitive. Overall, this is a solid feature implementation.

_{Note: Security Review did not run due to the size of the PR.}

src/agent/dispatcher.rs

src/app.rs

src/llm/rig_adapter.rs

src/tools/builtin/image_analyze.rs

End-to-end image handling: upload, generation, analysis, editing, and rendering across web gateway, HTTP webhook, WASM (Telegram/Slack), and REPL channels. Builds on the attachment infrastructure from #596 and draws inspiration from PR #641's image pipeline approach — credit to that PR's author for the sentinel JSON pattern and base64-in-JSON upload design. Key changes: - Image upload in web UI (file picker, paste, preview strip) - Image generation tool (FLUX/DALL-E via /v1/images/generations) - Image edit tool (multipart /v1/images/edits with fallback) - Image analysis tool (vision model for workspace images) - Model detection utilities (image_models.rs, vision_models.rs) - Sentinel JSON detection in dispatcher for generated image rendering - StatusUpdate::ImageGenerated → SSE/WS/REPL/WASM broadcast - HTTP webhook attachment support (base64, 5MB/file, 10MB total) - WASM channel image download (Telegram via file API, Slack via host HTTP) - Tool registration wiring in app.rs [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: full image support across all channels End-to-end image handling: upload, generation, analysis, editing, and rendering across web gateway, HTTP webhook, WASM (Telegram/Slack), and REPL channels. Builds on the attachment infrastructure from #596 and draws inspiration from PR #641's image pipeline approach — credit to that PR's author for the sentinel JSON pattern and base64-in-JSON upload design. Key changes: - Image upload in web UI (file picker, paste, preview strip) - Image generation tool (FLUX/DALL-E via /v1/images/generations) - Image edit tool (multipart /v1/images/edits with fallback) - Image analysis tool (vision model for workspace images) - Model detection utilities (image_models.rs, vision_models.rs) - Sentinel JSON detection in dispatcher for generated image rendering - StatusUpdate::ImageGenerated → SSE/WS/REPL/WASM broadcast - HTTP webhook attachment support (base64, 5MB/file, 10MB total) - WASM channel image download (Telegram via file API, Slack via host HTTP) - Tool registration wiring in app.rs [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR #725 review comments (16 issues) - SecretString for API keys in all image tools (image_gen, image_edit, image_analyze) - Binary image read via tokio::fs::read instead of DB-backed workspace.read() - Replace Arc<Workspace> with Option<PathBuf> base_dir (workspace has no filesystem API) - ApprovalRequirement::UnlessAutoApproved for cost-sensitive image tools - Scope sentinel detection to image_generate/image_edit tool names only - Skip ToolResult preview broadcast for image sentinels (avoids multi-MB base64 in SSE) - Extract shared media_type_from_path() to builtin/mod.rs - Rename fallback_chat_edit → fallback_generate with tracing::warn - Increase gateway body limit from 1MB to 10MB for image uploads - Increase webhook body limit to 15MB (base64 overhead) - Log warning on invalid base64 in images_to_attachments - Client-side image size limits (5MB/file, 5 images max) in app.js - aria-label on attach button for accessibility - Update body_too_large test for new 10MB limit [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add Slack file size check before download (PR review item #15) Skip downloading files larger than 20 MB in the Slack WASM channel to avoid excessive memory use and slow downloads in the WASM runtime. Logs a warning when a file is skipped. Also bumps channel versions for Slack and Telegram (prior branch changes). [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: cargo fmt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(security): add path validation and approval requirement to image tools Add sandbox path validation via validate_path() to both ImageAnalyzeTool and ImageEditTool to prevent path traversal attacks that could exfiltrate arbitrary files through external vision/edit APIs. Also fix ImageAnalyzeTool::requires_approval to return UnlessAutoApproved, consistent with ImageEditTool and ImageGenerateTool. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: post-download size guards and empty data_url sentinel check - Slack: add post-download size check on actual bytes when metadata size_bytes is absent, preventing bypass of the 20MB limit - Telegram: add 20MB download size limit (matching Slack) enforced in download_telegram_file() after receiving response bytes - Dispatcher: skip broadcasting ImageGenerated SSE event when data_url is empty from unwrap_or_default(), log warning instead Closes correctness issues #3, #4, #5 from PR #725 review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use mime_guess for media type detection, add alt attrs and media_type validation - Replace hardcoded media type mapping with mime_guess crate (already in deps) - Add alt attributes to img elements in web UI for accessibility - Validate media_type starts with "image/" in images_to_attachments() - Update bmp test assertion to match mime_guess behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Zaki <zaki@iqlusion.io>

ilblackdragon · 2026-03-09T07:22:20Z

Fully implemented via #725 (merged). Thanks @nickpismenkov for the original design — the sentinel JSON pattern and base64-in-JSON upload approach from this issue shaped the final implementation.

What shipped in #725:

Image generation tool (image_generate) — FLUX/DALL-E via /v1/images/generations
Image edit tool (image_edit) — multipart edit with fallback
Image analysis tool (image_analyze) — vision model integration
Web gateway — file picker, clipboard paste, thumbnail preview, inline rendering
HTTP webhook — base64 attachments with size validation
WASM channels — Telegram file download, Slack credential-injected download
All channels handle ImageGenerated status updates
Model detection — auto-registers tools based on available model capabilities

26 files changed, +1700 lines. All tests passing. 🎉

* feat: full image support across all channels End-to-end image handling: upload, generation, analysis, editing, and rendering across web gateway, HTTP webhook, WASM (Telegram/Slack), and REPL channels. Builds on the attachment infrastructure from nearai#596 and draws inspiration from PR nearai#641's image pipeline approach — credit to that PR's author for the sentinel JSON pattern and base64-in-JSON upload design. Key changes: - Image upload in web UI (file picker, paste, preview strip) - Image generation tool (FLUX/DALL-E via /v1/images/generations) - Image edit tool (multipart /v1/images/edits with fallback) - Image analysis tool (vision model for workspace images) - Model detection utilities (image_models.rs, vision_models.rs) - Sentinel JSON detection in dispatcher for generated image rendering - StatusUpdate::ImageGenerated → SSE/WS/REPL/WASM broadcast - HTTP webhook attachment support (base64, 5MB/file, 10MB total) - WASM channel image download (Telegram via file API, Slack via host HTTP) - Tool registration wiring in app.rs [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR nearai#725 review comments (16 issues) - SecretString for API keys in all image tools (image_gen, image_edit, image_analyze) - Binary image read via tokio::fs::read instead of DB-backed workspace.read() - Replace Arc<Workspace> with Option<PathBuf> base_dir (workspace has no filesystem API) - ApprovalRequirement::UnlessAutoApproved for cost-sensitive image tools - Scope sentinel detection to image_generate/image_edit tool names only - Skip ToolResult preview broadcast for image sentinels (avoids multi-MB base64 in SSE) - Extract shared media_type_from_path() to builtin/mod.rs - Rename fallback_chat_edit → fallback_generate with tracing::warn - Increase gateway body limit from 1MB to 10MB for image uploads - Increase webhook body limit to 15MB (base64 overhead) - Log warning on invalid base64 in images_to_attachments - Client-side image size limits (5MB/file, 5 images max) in app.js - aria-label on attach button for accessibility - Update body_too_large test for new 10MB limit [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add Slack file size check before download (PR review item nearai#15) Skip downloading files larger than 20 MB in the Slack WASM channel to avoid excessive memory use and slow downloads in the WASM runtime. Logs a warning when a file is skipped. Also bumps channel versions for Slack and Telegram (prior branch changes). [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: cargo fmt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(security): add path validation and approval requirement to image tools Add sandbox path validation via validate_path() to both ImageAnalyzeTool and ImageEditTool to prevent path traversal attacks that could exfiltrate arbitrary files through external vision/edit APIs. Also fix ImageAnalyzeTool::requires_approval to return UnlessAutoApproved, consistent with ImageEditTool and ImageGenerateTool. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: post-download size guards and empty data_url sentinel check - Slack: add post-download size check on actual bytes when metadata size_bytes is absent, preventing bypass of the 20MB limit - Telegram: add 20MB download size limit (matching Slack) enforced in download_telegram_file() after receiving response bytes - Dispatcher: skip broadcasting ImageGenerated SSE event when data_url is empty from unwrap_or_default(), log warning instead Closes correctness issues nearai#3, nearai#4, nearai#5 from PR nearai#725 review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use mime_guess for media type detection, add alt attrs and media_type validation - Replace hardcoded media type mapping with mime_guess crate (already in deps) - Add alt attributes to img elements in web UI for accessibility - Validate media_type starts with "image/" in images_to_attachments() - Update bmp test assertion to match mime_guess behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Zaki <zaki@iqlusion.io>

feat: Support processing images by IronClaw

f4d290f

nickpismenkov linked an issue Mar 7, 2026 that may be closed by this pull request

Support processing images by IronClaw #549

Closed

gemini-code-assist bot reviewed Mar 7, 2026

View reviewed changes

src/agent/dispatcher.rs Outdated Show resolved Hide resolved

src/app.rs Outdated Show resolved Hide resolved

src/llm/rig_adapter.rs Outdated Show resolved Hide resolved

src/tools/builtin/image_analyze.rs Show resolved Hide resolved

github-actions bot mentioned this pull request Mar 7, 2026

🦞 OpenClaw 生态日报 2026-03-07 rollysys/agents-radar#48

Open

nickpismenkov added 3 commits March 6, 2026 18:41

review fixes

833738b

Merge branch 'main' into feat/images

13813cb

fix linter

a320a64

nickpismenkov requested a review from henrypark133 March 7, 2026 02:50

ilblackdragon mentioned this pull request Mar 8, 2026

feat: full image support across all channels #725

Merged

7 tasks

ilblackdragon closed this Mar 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support processing images by IronClaw#641

feat: Support processing images by IronClaw#641
nickpismenkov wants to merge 4 commits intomainfrom
feat/images

nickpismenkov commented Mar 7, 2026

Uh oh!

gemini-code-assist bot commented Mar 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ilblackdragon commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nickpismenkov commented Mar 7, 2026

Uh oh!

gemini-code-assist bot commented Mar 7, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ilblackdragon commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants