Problem
The image-vision skill currently resizes all incoming images to max 1024×1024 JPEG and discards the original. The agent only has access to the resized version.
This is limiting for downstream workflows that need full-resolution images — for example:
- Passing to Google Gemini or other vision models that handle higher resolution
- Saving originals to cloud storage (GDrive, S3)
- OCR on detailed documents or receipts
Proposed Solution
Save both the original and resized versions:
attachments/
img-1711234567890-abc1.jpg # resized (max 1024px, JPEG 85%) — sent to Claude
img-1711234567890-abc1-original.png # original resolution, original format
- The resized image continues to be sent as the multimodal content block to Claude (keeps token usage reasonable)
- The original is preserved on disk and the agent is informed of its path via a text message
- The agent can then use the original for any downstream action that benefits from full resolution
Changes
src/image.ts — detect original format via sharp metadata, save both versions, update interfaces and parser
container/agent-runner/src/index.ts — inform agent of original file paths after multimodal blocks
src/container-runner.ts, src/index.ts — pass originalRelativePath through the pipeline
Notes
- Twilio (and possibly other channels) re-encode images on delivery, so sending the original back via WhatsApp still results in some quality loss. The real value is for agent-side processing (API calls to other models, file storage, etc.)
- Original format is detected and preserved (PNG stays PNG, WebP stays WebP, etc.)
🤖 Generated with Claude Code
Problem
The image-vision skill currently resizes all incoming images to max 1024×1024 JPEG and discards the original. The agent only has access to the resized version.
This is limiting for downstream workflows that need full-resolution images — for example:
Proposed Solution
Save both the original and resized versions:
Changes
src/image.ts— detect original format via sharp metadata, save both versions, update interfaces and parsercontainer/agent-runner/src/index.ts— inform agent of original file paths after multimodal blockssrc/container-runner.ts,src/index.ts— passoriginalRelativePaththrough the pipelineNotes
🤖 Generated with Claude Code