feat: Add vision provider support for multi-modal image recognition by yufan001 · Pull Request #1397 · HKUDS/nanobot

yufan001 · 2026-03-02T03:09:00Z

Summary

This PR adds vision provider configuration support for multi-modal image recognition in nanobot.

Changes

Added vision provider configuration field in config/schema.py
Updated gateway() function in cli/commands.py to initialize vision_provider when vision_model is configured
Enables using different API endpoints for vision models vs text-only models

Motivation

Users may want to use different API providers/endpoints for:

Text conversations (e.g., custom provider with dashscope API)
Image recognition (e.g., dedicated vision provider with iflow API)

This implementation allows configuring a separate vision provider in config.json:

{
  "providers": {
    "custom": { "apiKey": "...", "apiBase": "https://dashscope.aliyuncs.com/v1" },
    "vision": { "apiKey": "...", "apiBase": "https://apis.iflow.cn/v1" }
  },
  "agents": {
    "defaults": { "visionModel": "qwen3-vl-plus" }
  }
}

Related Issue

Related to #223 (Multi-Modal Support: Images, Voice, and Video)

This implements Phase 1 (Vision) configuration support for the gateway mode, complementing the existing Feishu image download support (commit 49cc0c5).

- Add vision provider configuration in schema.py for dedicated vision model API - Update gateway() to initialize vision_provider when vision_model is configured - Enables using different API endpoints for vision vs text-only requests - Supports multi-modal LLMs like qwen3-vl-plus, claude-3, gpt-4-vision Related to issue #223 (Multi-Modal Support: Images, Voice, and Video) This implements Phase 1 (Vision) configuration support for the gateway mode.

yufan001 closed this Mar 2, 2026

github-actions Bot mentioned this pull request Mar 3, 2026

🦞 OpenClaw 生态日报 2026-03-03 duanyytop/agents-radar#46

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add vision provider support for multi-modal image recognition#1397

feat: Add vision provider support for multi-modal image recognition#1397
yufan001 wants to merge 1 commit intoHKUDS:mainfrom
yufan001:feat/vision-provider-support

yufan001 commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yufan001 commented Mar 2, 2026

Summary

Changes

Motivation

Related Issue

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant