[Feat] Ollama Image API Support

We currently support the OpenAI Vision API, in which messages look like this:
```python
messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],
```
However, ollama only supports local image paths or Base64 encoded images and seems to break unless queried like so:
```python
messages=[
    {
      'role': 'user',
      'content': 'Whats in this image?',
      'images': [path],
    }
  ],
```
There's [PR 5208](https://github.com/ollama/ollama/pull/5208) merged in ollama, which should resolve the issue of `content` being an array instead of a string. However, [PR 6680](https://github.com/BerriAI/litellm/pull/6880) is currently open for LiteLLM to fix the exact unmarshalling error referenced in #10, so it could be the way they are querying ollama. If this gets merged, we might not need to do anything. Otherwise, we could implement basically the same fix on our end (i.e. flattening the `content` array, adding an `images` key, and potentially throwing a more explicit error for web images).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feat] Ollama Image API Support #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feat] Ollama Image API Support #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions