Skip to content

feat: Add vision provider support for multi-modal image recognition#1397

Closed
yufan001 wants to merge 1 commit intoHKUDS:mainfrom
yufan001:feat/vision-provider-support
Closed

feat: Add vision provider support for multi-modal image recognition#1397
yufan001 wants to merge 1 commit intoHKUDS:mainfrom
yufan001:feat/vision-provider-support

Conversation

@yufan001
Copy link
Copy Markdown

@yufan001 yufan001 commented Mar 2, 2026

Summary

This PR adds vision provider configuration support for multi-modal image recognition in nanobot.

Changes

  • Added vision provider configuration field in config/schema.py
  • Updated gateway() function in cli/commands.py to initialize vision_provider when vision_model is configured
  • Enables using different API endpoints for vision models vs text-only models

Motivation

Users may want to use different API providers/endpoints for:

  • Text conversations (e.g., custom provider with dashscope API)
  • Image recognition (e.g., dedicated vision provider with iflow API)

This implementation allows configuring a separate vision provider in config.json:

{
  "providers": {
    "custom": { "apiKey": "...", "apiBase": "https://dashscope.aliyuncs.com/v1" },
    "vision": { "apiKey": "...", "apiBase": "https://apis.iflow.cn/v1" }
  },
  "agents": {
    "defaults": { "visionModel": "qwen3-vl-plus" }
  }
}

Related Issue

Related to #223 (Multi-Modal Support: Images, Voice, and Video)

This implements Phase 1 (Vision) configuration support for the gateway mode, complementing the existing Feishu image download support (commit 49cc0c5).

- Add vision provider configuration in schema.py for dedicated vision model API
- Update gateway() to initialize vision_provider when vision_model is configured
- Enables using different API endpoints for vision vs text-only requests
- Supports multi-modal LLMs like qwen3-vl-plus, claude-3, gpt-4-vision

Related to issue #223 (Multi-Modal Support: Images, Voice, and Video)
This implements Phase 1 (Vision) configuration support for the gateway mode.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant