Skip to content

[Feature]: Gemini provider missing image/multimodal input support #2376

@rajnaveen344

Description

@rajnaveen344

Summary

The Gemini provider's Part struct is text-only, preventing image inputs from being sent via the native inlineData API — even though Gemini models natively support multimodal image inputs.

Problem statement

Other providers (Anthropic, OpenAI-compatible) already parse [IMAGE:...] markers and convert them to their native image formats. The Gemini provider ignores these markers and sends them as raw text, making multimodal workflows broken for Gemini users.

Proposed solution

Replace the flat Part struct with a serde untagged enum supporting Part::Text and Part::InlineData variants. Add a build_user_parts() helper that reuses the existing multimodal::parse_image_markers() to extract image markers and produce Gemini-native inlineData entries. Supports multiple images per message.

Non-goals / out of scope

  • Files API support for large images (>20MB) — can be a follow-up
  • Video/audio multimodal inputs
  • Changes to multimodal.rs or provider traits

Alternatives considered

  • Adding image support via the Files API — more complex, requires upload step, not needed for inline use cases under 20MB

Acceptance criteria

  • User messages with [IMAGE:data:...] markers produce correct inlineData parts in the Gemini API request
  • Multiple images per message are supported
  • Text-only messages serialize identically to current behavior (backward compatible)
  • Tests cover text-only, single image, multiple images, image-only, and fallback paths

Architecture impact

None. Change is scoped to src/providers/gemini.rs. No trait changes, no new dependencies, no config changes.

Risk and rollback

Low risk. Single-file, additive change. git revert <commit> for rollback.

Breaking change?

No

Data hygiene checks

  • No personal or sensitive data in tests — synthetic base64 stubs used
  • Neutral project-scoped wording confirmed

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions