Skip to content

feat: Support processing images by IronClaw#641

Closed
nickpismenkov wants to merge 4 commits intomainfrom
feat/images
Closed

feat: Support processing images by IronClaw#641
nickpismenkov wants to merge 4 commits intomainfrom
feat/images

Conversation

@nickpismenkov
Copy link
Copy Markdown
Contributor

No description provided.

@github-actions github-actions bot added scope: agent Agent core (agent loop, router, scheduler) scope: channel Channel infrastructure scope: channel/web Web gateway channel scope: channel/wasm WASM channel runtime scope: tool Tool infrastructure scope: tool/builtin Built-in tools scope: llm LLM integration size: XL 500+ changed lines risk: medium Business logic, config, or moderate-risk modules contributor: regular 2-5 merged PRs labels Mar 7, 2026
@nickpismenkov nickpismenkov linked an issue Mar 7, 2026 that may be closed by this pull request
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the agent's capabilities by integrating multimodal image processing. It introduces new tools for image generation, editing, and analysis, along with the necessary backend infrastructure to handle image data and a revamped frontend to support user image attachments and display generated visuals. This allows the agent to interact with and create visual content, expanding its problem-solving domain.

Highlights

  • Multimodal Image Support: Introduced comprehensive support for image processing, allowing users to attach images to messages and enabling the agent to generate, edit, and analyze images.
  • New Image Tools: Added three new built-in tools: image_generate for creating images from text prompts, image_edit for modifying existing images, and image_analyze for leveraging LLM vision capabilities to understand image content. These tools integrate with the NEAR AI cloud-api (FLUX model) for generation and editing.
  • Frontend Enhancements: The web user interface now supports attaching images via file selection or paste, displays previews of staged images, and renders generated images directly within the chat feed.
  • Backend Infrastructure for Images: Updated core data structures like ChatMessage and IncomingMessage to include ImageAttachments. The dispatcher now detects image_generated sentinel events from tool outputs and broadcasts them via SSE, and the rig adapter handles multi-modal message conversion for LLMs.
  • Dynamic Tool Registration: The application now dynamically registers image generation, editing, and vision analysis tools based on the availability of compatible LLM models, ensuring that only relevant tools are exposed.
Changelog
  • .claude/settings.local.json
    • Added permissions for Bash commands (cargo check, clippy, test, fmt, grep, env) and Skill(ship).
  • src/agent/dispatcher.rs
    • Added logic to detect image_generated sentinel in tool output and emit a StatusUpdate::ImageGenerated SSE event.
  • src/agent/session.rs
    • Imported ImageAttachment from llm module.
    • Added a images field to the Turn struct to store attached images.
    • Implemented start_turn_with_images method to create a new turn with user input and image attachments.
    • Modified the messages method to construct ChatMessage::user_with_images when images are present in a turn.
  • src/agent/thread_ops.rs
    • Updated the start_turn logic to use thread.start_turn_with_images if the incoming message contains image attachments.
  • src/app.rs
    • Implemented dynamic registration of image generation and editing tools if compatible LLM models are detected.
    • Implemented dynamic registration of vision analysis tools if vision-capable LLM models are detected.
  • src/channels/channel.rs
    • Imported ImageAttachment from llm module.
    • Added an images field to the IncomingMessage struct to hold attached images.
    • Added a with_images builder method to IncomingMessage for attaching images.
  • src/channels/repl.rs
    • Added handling for StatusUpdate::ImageGenerated to print a console message indicating an image was generated.
  • src/channels/wasm/wrapper.rs
    • Added conversion logic for StatusUpdate::ImageGenerated to the WIT channel status update.
  • src/channels/web/mod.rs
    • Added SseEvent::ImageGenerated to broadcast image generation events to web clients.
  • src/channels/web/openai_compat.rs
    • Initialized the images field to an empty vector during OpenAiMessage to ChatMessage conversion.
  • src/channels/web/server.rs
    • Imported ImageAttachment for use in web server logic.
    • Converted ImageData from incoming requests into ImageAttachment and attached them to IncomingMessage.
  • src/channels/web/sse.rs
    • Added image_generated as a recognized event type for Server-Sent Events (SSE).
  • src/channels/web/static/app.js
    • Added stagedImages array to manage image attachments in the frontend.
    • Implemented event listeners for file input and paste events to handle image uploads.
    • Added renderImagePreviews and removeImage functions to manage and display image previews.
    • Modified sendMessage to include staged images in the API request body.
    • Added an event listener for image_generated SSE events to display generated images in the chat.
  • src/channels/web/static/index.html
    • Added a hidden file input and an 'Attach image' button to the chat input area.
    • Included an image-preview-strip div to display previews of attached images.
  • src/channels/web/static/style.css
    • Added CSS styles for .generated-image-card, .generated-image, and .generated-image-path to display generated images.
    • Added CSS styles for the .attach-btn, .image-preview-strip, .image-preview, and .image-preview-remove for image attachment functionality.
  • src/channels/web/types.rs
    • Defined a new ImageData struct for base64-encoded image data.
    • Added an images field to SendMessageRequest and WsClientMessage::Message to support image attachments.
    • Added SseEvent::ImageGenerated enum variant for broadcasting image generation events.
  • src/channels/web/ws.rs
    • Imported ImageAttachment for WebSocket message processing.
    • Updated handle_client_message to convert ImageData from WebSocket messages into ImageAttachment and include them in IncomingMessage.
  • src/llm/image_models.rs
    • Added a new module image_models to detect and suggest image generation models (e.g., FLUX, DALL-E, Stable Diffusion).
  • src/llm/mod.rs
    • Added image_models and vision_models modules to the LLM module exports.
    • Exported ImageAttachment from the provider module.
  • src/llm/nearai_chat.rs
    • Changed ChatCompletionMessage's content field type to Option<serde_json::Value> to support multi-modal content.
    • Updated flatten_tool_messages to correctly handle serde_json::Value content when processing assistant and tool messages.
    • Modified the From<ChatMessage> implementation to construct multi-part content for user messages with image attachments, formatting them as image_url objects.
  • src/llm/provider.rs
    • Defined a new ImageAttachment struct to represent image data with media type and base64 content.
    • Added an images field to the ChatMessage struct to store image attachments.
    • Added a user_with_images constructor for ChatMessage to easily create user messages with attached images.
  • src/llm/rig_adapter.rs
    • Imported rig's Image, ImageMediaType, and DocumentSourceKind for multi-modal message handling.
    • Modified convert_messages to handle ChatMessages with image attachments by creating multi-part RigMessage::User content, converting ImageAttachments to rig::message::Image.
  • src/llm/vision_models.rs
    • Added a new module vision_models to detect and suggest vision-capable LLM models (e.g., Claude, GPT-4, Gemini).
  • src/tools/builtin/image_analyze.rs
    • Added a new tool ImageAnalyzeTool for analyzing images using vision-capable LLMs. It reads images from the workspace and prepares them for analysis.
  • src/tools/builtin/image_edit.rs
    • Added a new tool ImageEditTool for editing existing images via the NEAR AI cloud-api (FLUX model). It handles reading source images, sending edit requests, and saving the result.
  • src/tools/builtin/image_gen.rs
    • Added a new tool ImageGenerateTool for generating images from text prompts using the NEAR AI cloud-api (FLUX model). It handles prompt submission and saving generated images.
  • src/tools/builtin/mod.rs
    • Exported the newly added ImageAnalyzeTool, ImageEditTool, and ImageGenerateTool.
  • src/tools/registry.rs
    • Imported the new image-related tools (ImageEditTool, ImageGenerateTool).
    • Added image_generate, image_edit, and image_analyze to the list of PROTECTED_TOOL_NAMES.
    • Implemented register_image_tools to register image generation and editing tools with the provided NEAR AI configuration and workspace.
    • Implemented register_vision_tools to register the image analysis tool with the workspace.
  • tests/e2e_routine_heartbeat.rs
    • Updated IncomingMessage instantiations in tests to include the new images field, initialized as an empty vector.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant new functionality for image processing, including image generation, editing, and analysis. The changes are well-structured, touching on the agent core, channels, LLM providers, and UI. The introduction of new tools for image manipulation is a great addition.

I've made a few suggestions to improve code clarity, maintainability, and robustness, mainly around simplifying some logic and making string comparisons case-insensitive. Overall, this is a solid feature implementation.

Note: Security Review did not run due to the size of the PR.

ilblackdragon added a commit that referenced this pull request Mar 8, 2026
End-to-end image handling: upload, generation, analysis, editing, and
rendering across web gateway, HTTP webhook, WASM (Telegram/Slack), and
REPL channels. Builds on the attachment infrastructure from #596 and
draws inspiration from PR #641's image pipeline approach — credit to
that PR's author for the sentinel JSON pattern and base64-in-JSON
upload design.

Key changes:
- Image upload in web UI (file picker, paste, preview strip)
- Image generation tool (FLUX/DALL-E via /v1/images/generations)
- Image edit tool (multipart /v1/images/edits with fallback)
- Image analysis tool (vision model for workspace images)
- Model detection utilities (image_models.rs, vision_models.rs)
- Sentinel JSON detection in dispatcher for generated image rendering
- StatusUpdate::ImageGenerated → SSE/WS/REPL/WASM broadcast
- HTTP webhook attachment support (base64, 5MB/file, 10MB total)
- WASM channel image download (Telegram via file API, Slack via host HTTP)
- Tool registration wiring in app.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ilblackdragon added a commit that referenced this pull request Mar 9, 2026
End-to-end image handling: upload, generation, analysis, editing, and
rendering across web gateway, HTTP webhook, WASM (Telegram/Slack), and
REPL channels. Builds on the attachment infrastructure from #596 and
draws inspiration from PR #641's image pipeline approach — credit to
that PR's author for the sentinel JSON pattern and base64-in-JSON
upload design.

Key changes:
- Image upload in web UI (file picker, paste, preview strip)
- Image generation tool (FLUX/DALL-E via /v1/images/generations)
- Image edit tool (multipart /v1/images/edits with fallback)
- Image analysis tool (vision model for workspace images)
- Model detection utilities (image_models.rs, vision_models.rs)
- Sentinel JSON detection in dispatcher for generated image rendering
- StatusUpdate::ImageGenerated → SSE/WS/REPL/WASM broadcast
- HTTP webhook attachment support (base64, 5MB/file, 10MB total)
- WASM channel image download (Telegram via file API, Slack via host HTTP)
- Tool registration wiring in app.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ilblackdragon added a commit that referenced this pull request Mar 9, 2026
* feat: full image support across all channels

End-to-end image handling: upload, generation, analysis, editing, and
rendering across web gateway, HTTP webhook, WASM (Telegram/Slack), and
REPL channels. Builds on the attachment infrastructure from #596 and
draws inspiration from PR #641's image pipeline approach — credit to
that PR's author for the sentinel JSON pattern and base64-in-JSON
upload design.

Key changes:
- Image upload in web UI (file picker, paste, preview strip)
- Image generation tool (FLUX/DALL-E via /v1/images/generations)
- Image edit tool (multipart /v1/images/edits with fallback)
- Image analysis tool (vision model for workspace images)
- Model detection utilities (image_models.rs, vision_models.rs)
- Sentinel JSON detection in dispatcher for generated image rendering
- StatusUpdate::ImageGenerated → SSE/WS/REPL/WASM broadcast
- HTTP webhook attachment support (base64, 5MB/file, 10MB total)
- WASM channel image download (Telegram via file API, Slack via host HTTP)
- Tool registration wiring in app.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR #725 review comments (16 issues)

- SecretString for API keys in all image tools (image_gen, image_edit, image_analyze)
- Binary image read via tokio::fs::read instead of DB-backed workspace.read()
- Replace Arc<Workspace> with Option<PathBuf> base_dir (workspace has no filesystem API)
- ApprovalRequirement::UnlessAutoApproved for cost-sensitive image tools
- Scope sentinel detection to image_generate/image_edit tool names only
- Skip ToolResult preview broadcast for image sentinels (avoids multi-MB base64 in SSE)
- Extract shared media_type_from_path() to builtin/mod.rs
- Rename fallback_chat_edit → fallback_generate with tracing::warn
- Increase gateway body limit from 1MB to 10MB for image uploads
- Increase webhook body limit to 15MB (base64 overhead)
- Log warning on invalid base64 in images_to_attachments
- Client-side image size limits (5MB/file, 5 images max) in app.js
- aria-label on attach button for accessibility
- Update body_too_large test for new 10MB limit

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add Slack file size check before download (PR review item #15)

Skip downloading files larger than 20 MB in the Slack WASM channel to
avoid excessive memory use and slow downloads in the WASM runtime.
Logs a warning when a file is skipped. Also bumps channel versions
for Slack and Telegram (prior branch changes).

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(security): add path validation and approval requirement to image tools

Add sandbox path validation via validate_path() to both ImageAnalyzeTool
and ImageEditTool to prevent path traversal attacks that could exfiltrate
arbitrary files through external vision/edit APIs. Also fix
ImageAnalyzeTool::requires_approval to return UnlessAutoApproved,
consistent with ImageEditTool and ImageGenerateTool.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: post-download size guards and empty data_url sentinel check

- Slack: add post-download size check on actual bytes when metadata
  size_bytes is absent, preventing bypass of the 20MB limit
- Telegram: add 20MB download size limit (matching Slack) enforced
  in download_telegram_file() after receiving response bytes
- Dispatcher: skip broadcasting ImageGenerated SSE event when
  data_url is empty from unwrap_or_default(), log warning instead

Closes correctness issues #3, #4, #5 from PR #725 review.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use mime_guess for media type detection, add alt attrs and media_type validation

- Replace hardcoded media type mapping with mime_guess crate (already in deps)
- Add alt attributes to img elements in web UI for accessibility
- Validate media_type starts with "image/" in images_to_attachments()
- Update bmp test assertion to match mime_guess behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Zaki <zaki@iqlusion.io>
@ilblackdragon
Copy link
Copy Markdown
Member

Fully implemented via #725 (merged). Thanks @nickpismenkov for the original design — the sentinel JSON pattern and base64-in-JSON upload approach from this issue shaped the final implementation.

What shipped in #725:

  • Image generation tool (image_generate) — FLUX/DALL-E via /v1/images/generations
  • Image edit tool (image_edit) — multipart edit with fallback
  • Image analysis tool (image_analyze) — vision model integration
  • Web gateway — file picker, clipboard paste, thumbnail preview, inline rendering
  • HTTP webhook — base64 attachments with size validation
  • WASM channels — Telegram file download, Slack credential-injected download
  • All channels handle ImageGenerated status updates
  • Model detection — auto-registers tools based on available model capabilities

26 files changed, +1700 lines. All tests passing. 🎉

bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
* feat: full image support across all channels

End-to-end image handling: upload, generation, analysis, editing, and
rendering across web gateway, HTTP webhook, WASM (Telegram/Slack), and
REPL channels. Builds on the attachment infrastructure from nearai#596 and
draws inspiration from PR nearai#641's image pipeline approach — credit to
that PR's author for the sentinel JSON pattern and base64-in-JSON
upload design.

Key changes:
- Image upload in web UI (file picker, paste, preview strip)
- Image generation tool (FLUX/DALL-E via /v1/images/generations)
- Image edit tool (multipart /v1/images/edits with fallback)
- Image analysis tool (vision model for workspace images)
- Model detection utilities (image_models.rs, vision_models.rs)
- Sentinel JSON detection in dispatcher for generated image rendering
- StatusUpdate::ImageGenerated → SSE/WS/REPL/WASM broadcast
- HTTP webhook attachment support (base64, 5MB/file, 10MB total)
- WASM channel image download (Telegram via file API, Slack via host HTTP)
- Tool registration wiring in app.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR nearai#725 review comments (16 issues)

- SecretString for API keys in all image tools (image_gen, image_edit, image_analyze)
- Binary image read via tokio::fs::read instead of DB-backed workspace.read()
- Replace Arc<Workspace> with Option<PathBuf> base_dir (workspace has no filesystem API)
- ApprovalRequirement::UnlessAutoApproved for cost-sensitive image tools
- Scope sentinel detection to image_generate/image_edit tool names only
- Skip ToolResult preview broadcast for image sentinels (avoids multi-MB base64 in SSE)
- Extract shared media_type_from_path() to builtin/mod.rs
- Rename fallback_chat_edit → fallback_generate with tracing::warn
- Increase gateway body limit from 1MB to 10MB for image uploads
- Increase webhook body limit to 15MB (base64 overhead)
- Log warning on invalid base64 in images_to_attachments
- Client-side image size limits (5MB/file, 5 images max) in app.js
- aria-label on attach button for accessibility
- Update body_too_large test for new 10MB limit

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add Slack file size check before download (PR review item nearai#15)

Skip downloading files larger than 20 MB in the Slack WASM channel to
avoid excessive memory use and slow downloads in the WASM runtime.
Logs a warning when a file is skipped. Also bumps channel versions
for Slack and Telegram (prior branch changes).

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(security): add path validation and approval requirement to image tools

Add sandbox path validation via validate_path() to both ImageAnalyzeTool
and ImageEditTool to prevent path traversal attacks that could exfiltrate
arbitrary files through external vision/edit APIs. Also fix
ImageAnalyzeTool::requires_approval to return UnlessAutoApproved,
consistent with ImageEditTool and ImageGenerateTool.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: post-download size guards and empty data_url sentinel check

- Slack: add post-download size check on actual bytes when metadata
  size_bytes is absent, preventing bypass of the 20MB limit
- Telegram: add 20MB download size limit (matching Slack) enforced
  in download_telegram_file() after receiving response bytes
- Dispatcher: skip broadcasting ImageGenerated SSE event when
  data_url is empty from unwrap_or_default(), log warning instead

Closes correctness issues nearai#3, nearai#4, nearai#5 from PR nearai#725 review.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use mime_guess for media type detection, add alt attrs and media_type validation

- Replace hardcoded media type mapping with mime_guess crate (already in deps)
- Add alt attributes to img elements in web UI for accessibility
- Validate media_type starts with "image/" in images_to_attachments()
- Update bmp test assertion to match mime_guess behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Zaki <zaki@iqlusion.io>
drchirag1991 pushed a commit to drchirag1991/ironclaw that referenced this pull request Apr 8, 2026
* feat: full image support across all channels

End-to-end image handling: upload, generation, analysis, editing, and
rendering across web gateway, HTTP webhook, WASM (Telegram/Slack), and
REPL channels. Builds on the attachment infrastructure from nearai#596 and
draws inspiration from PR nearai#641's image pipeline approach — credit to
that PR's author for the sentinel JSON pattern and base64-in-JSON
upload design.

Key changes:
- Image upload in web UI (file picker, paste, preview strip)
- Image generation tool (FLUX/DALL-E via /v1/images/generations)
- Image edit tool (multipart /v1/images/edits with fallback)
- Image analysis tool (vision model for workspace images)
- Model detection utilities (image_models.rs, vision_models.rs)
- Sentinel JSON detection in dispatcher for generated image rendering
- StatusUpdate::ImageGenerated → SSE/WS/REPL/WASM broadcast
- HTTP webhook attachment support (base64, 5MB/file, 10MB total)
- WASM channel image download (Telegram via file API, Slack via host HTTP)
- Tool registration wiring in app.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR nearai#725 review comments (16 issues)

- SecretString for API keys in all image tools (image_gen, image_edit, image_analyze)
- Binary image read via tokio::fs::read instead of DB-backed workspace.read()
- Replace Arc<Workspace> with Option<PathBuf> base_dir (workspace has no filesystem API)
- ApprovalRequirement::UnlessAutoApproved for cost-sensitive image tools
- Scope sentinel detection to image_generate/image_edit tool names only
- Skip ToolResult preview broadcast for image sentinels (avoids multi-MB base64 in SSE)
- Extract shared media_type_from_path() to builtin/mod.rs
- Rename fallback_chat_edit → fallback_generate with tracing::warn
- Increase gateway body limit from 1MB to 10MB for image uploads
- Increase webhook body limit to 15MB (base64 overhead)
- Log warning on invalid base64 in images_to_attachments
- Client-side image size limits (5MB/file, 5 images max) in app.js
- aria-label on attach button for accessibility
- Update body_too_large test for new 10MB limit

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add Slack file size check before download (PR review item nearai#15)

Skip downloading files larger than 20 MB in the Slack WASM channel to
avoid excessive memory use and slow downloads in the WASM runtime.
Logs a warning when a file is skipped. Also bumps channel versions
for Slack and Telegram (prior branch changes).

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(security): add path validation and approval requirement to image tools

Add sandbox path validation via validate_path() to both ImageAnalyzeTool
and ImageEditTool to prevent path traversal attacks that could exfiltrate
arbitrary files through external vision/edit APIs. Also fix
ImageAnalyzeTool::requires_approval to return UnlessAutoApproved,
consistent with ImageEditTool and ImageGenerateTool.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: post-download size guards and empty data_url sentinel check

- Slack: add post-download size check on actual bytes when metadata
  size_bytes is absent, preventing bypass of the 20MB limit
- Telegram: add 20MB download size limit (matching Slack) enforced
  in download_telegram_file() after receiving response bytes
- Dispatcher: skip broadcasting ImageGenerated SSE event when
  data_url is empty from unwrap_or_default(), log warning instead

Closes correctness issues nearai#3, nearai#4, nearai#5 from PR nearai#725 review.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use mime_guess for media type detection, add alt attrs and media_type validation

- Replace hardcoded media type mapping with mime_guess crate (already in deps)
- Add alt attributes to img elements in web UI for accessibility
- Validate media_type starts with "image/" in images_to_attachments()
- Update bmp test assertion to match mime_guess behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Zaki <zaki@iqlusion.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: regular 2-5 merged PRs risk: medium Business logic, config, or moderate-risk modules scope: agent Agent core (agent loop, router, scheduler) scope: channel/wasm WASM channel runtime scope: channel/web Web gateway channel scope: channel Channel infrastructure scope: llm LLM integration scope: tool/builtin Built-in tools scope: tool Tool infrastructure size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support processing images by IronClaw

2 participants