Skip to content

[Feat] Ollama Image API Support #11

Open
@reyna-abhyankar

Description

@reyna-abhyankar

We currently support the OpenAI Vision API, in which messages look like this:

messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],

However, ollama only supports local image paths or Base64 encoded images and seems to break unless queried like so:

messages=[
    {
      'role': 'user',
      'content': 'Whats in this image?',
      'images': [path],
    }
  ],

There's PR 5208 merged in ollama, which should resolve the issue of content being an array instead of a string. However, PR 6680 is currently open for LiteLLM to fix the exact unmarshalling error referenced in #10, so it could be the way they are querying ollama. If this gets merged, we might not need to do anything. Otherwise, we could implement basically the same fix on our end (i.e. flattening the content array, adding an images key, and potentially throwing a more explicit error for web images).

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions