feat: add vLLM/local LLM support by ZhihaoZhang97 · Pull Request #4 · HKUDS/nanobot

ZhihaoZhang97 · 2026-02-02T00:26:12Z

Summary

This PR adds support for vLLM and other OpenAI-compatible local LLM endpoints.

Changes

config/schema.py: Added vllm provider configuration
providers/litellm_provider.py: Auto-detect vLLM endpoints, use hosted_vllm/ prefix for LiteLLM
cli/commands.py: Display vLLM status in nanobot status command
README.md: Added vLLM setup documentation

Usage

{
  "providers": {
    "vllm": {
      "apiKey": "dummy",
      "apiBase": "http://your-vllm-server:8000/v1"
    }
  },
  "agents": {
    "defaults": {
      "model": "your-model-name"
    }
  }
}

Testing

Tested with vLLM server running gpt-oss-120b model.

- Add vllm provider configuration in config schema - Auto-detect vLLM endpoints and use hosted_vllm/ prefix for LiteLLM - Pass api_base directly to acompletion for custom endpoints - Add vLLM status display in CLI status command - Add vLLM setup documentation in README

Re-bin · 2026-02-02T03:15:42Z

Hi ZhihaoZhang97,

This is great! Thanks for the PR :)

Best regards,
Xubin

umb255-cloud · 2026-02-05T10:58:08Z

这里不应该指定 vllm 应该以API通讯方式为主设置 OpenAI-compatible 这样所有支持API的服务都能调用比如ollama 和lm studio agents设置的 model-name 可以是* 这样不管后端换什么模型都不用改配置用OpenAI-compatible通讯前段也不用管模型是什么而且OpenAI-compatible也支持命令加载某个模型或切换模型

…support This commit fixes 6 categories of issues identified during code review: **Security Fixes (Task HKUDS#1, HKUDS#2):** - Fix LiteLLMProvider API key race condition by passing api_key directly to litellm instead of modifying os.environ (prevents credential leakage) - Fix RateLimiter defaultdict thread-safety issue by using explicit dict.get() - Add TTL-based cleanup to RateLimiter (max_age_seconds, max_entries) to prevent memory exhaustion DoS from unbounded user ID growth **Resource Management (Task HKUDS#3, HKUDS#4):** - Implement ProcessRegistry for tracking spawned ffmpeg/ffprobe processes - Add signal handlers (SIGTERM, SIGINT) for graceful process cleanup - Make video processing timeouts configurable (frame, audio, info) - Add timeout-safe process.wait() after process.kill() - Implement periodic background cleanup for media files - Add signal handlers to MediaCleanupRegistry for reliable cleanup - Add thread-safe file registration with get_stats() monitoring **Configuration Improvements (Task HKUDS#5):** - Make TTS model, max_text_length, and timeout configurable - Add validation for TTS config parameters (model, max_length, timeout) - Improve error messages with text preview for debugging - Return tuple[bool, str | None] from synthesize() to warn about truncation - Update TelegramChannel to handle TTS truncation warnings **Code Quality (Task HKUDS#6):** - Refactor rate limiters to use factory functions (cleaner API) - Keep backwards-compatible class aliases marked as deprecated - Update imports to use factory functions (tts_rate_limiter, etc.) **Documentation:** - Add CODE_REVIEW_ISSUES.md with detailed analysis and fix summary - Update CLAUDE.md with new multi-modal patterns and utilities **Testing:** - Add tests for rate limiter cleanup functionality - All 35 tests passing Core line count: 4,833 lines (security/reliability improvements added ~478 lines) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

feat(router): add CODING tier, per-tier secondary models, greeting patterns

feat: add vLLM/local LLM support

## Fixes (from subagent code review) ### P0: Subagent tool context isolation (HKUDS#1) - Subagent now calls set_tool_context() before executing tools - Prevents message routing to wrong session in concurrent scenarios ### P0: Atomic session writes (HKUDS#2) - Session save() now writes to temp file then os.replace() (atomic on POSIX) - Crash mid-write no longer corrupts/empties session file ### P1: Consolidation saves last_consolidated (HKUDS#3) - After successful consolidation, session is saved to persist the pointer - On failure, still advances pointer to avoid retrying same messages ### P1: admin.json mtime cache (HKUDS#4) - _load_admin_config() now caches by file mtime - Avoids re-reading disk on every tool call in high-frequency scenarios ### P1: Subagent registry atomic write (HKUDS#5) - _save_registry() uses temp file + os.replace() like session save ### P2: LLM error responses not saved to session (HKUDS#17) - When finish_reason == "error", the error is returned to user but NOT added to session history, preventing context pollution ### P2: Subagent timeout protection (HKUDS#15) - Max iterations reduced from 500 to 200 - Added 30-minute wall-clock timeout to prevent runaway tasks ### P2: SSRF protection for web_fetch (HKUDS#10) - _validate_url() now blocks private/loopback/link-local/reserved IPs ### P3: ReadFileTool binary file handling (HKUDS#13) - Catches UnicodeDecodeError and returns friendly message instead of crash

HKUDS#1 - Move MODEL_CONTEXT_WINDOWS/MODEL_PRICING to registry.py Add get_context_window() and get_pricing() module-level helpers Remove duplicate dicts from loop.py HKUDS#2 - Extract _build_status() into module-level _build_status_report() Pure function with no AgentLoop dependency; 80 lines → 12 lines in class HKUDS#3 - Extract on_cron_job closure into _make_cron_job_handler() gateway() is now ~50 lines shorter and easier to read HKUDS#4 - Extract _build_tools() factory function shared by AgentLoop + SubagentManager Eliminates ~15 lines of duplicated tool registration in subagent.py HKUDS#8 - Replace 6-tuple return from _run_agent_loop() with LoopResult dataclass Named fields instead of positional unpacking Also: - Move _strip_think/_tool_hint/_format_tool_detail to module level - Remove verbose=False branch from _format_tool_detail (unused) - Remove ToolRegistry import from subagent.py (no longer needed directly)

README fix

fix: cannot import name '_apply_patches'; update v0.1.6

…actions/setup-go-6 Bump actions/setup-go from 5 to 6

Add a user-facing CLI entrypoint for creating persisted coding tasks and reuse the shared coding-task runtime loader across gateway and CLI flows. Update harness state to mark feature HKUDS#4 complete and record the verification checkpoint. Co-authored-by: Codex <noreply@openai.com>

支持更新已有定时任务的名称、调度计划、消息内容、投递配置等可变字段。系统任务（system_event）受保护不可编辑。包含完整的单元测试覆盖。 Made-with: Cursor Co-authored-by: weitongtong <tongtong.wei@nodeskai.com>

feat: add vLLM/local LLM support

Remove the two blank-line-after-import additions in transcribe_audio that slipped in with the bus-bounding change. Same class of unrelated formatting churn the reviewer flagged in point HKUDS#4; now the PR-vs-base diff for base.py contains only the functional drop-feedback branch.

Resolves conflicts in: - session/manager.py: kept upstream's media-breadcrumb + timestamp logic, added thinking_blocks to history allowlist (upstream had reasoning_content, we extended). Patch HKUDS#6 effectively absorbed. - agent/loop.py: dropped consolidation_model param (upstream removed), took upstream's expanded constructor signature, ExecTool sandbox/ allowed_env_keys, conditional WebSearch/WebFetch via web_config.enable. Patch HKUDS#1 (disable web_search) now achieved via config not code. - agent/tools/message.py: combined upstream's path resolution + metadata with our patch HKUDS#3 (zero-byte/missing media validation), running validation after path resolution. - cli/commands.py: dropped consolidation_model arg from agent ctor, took upstream's expanded args. Patch HKUDS#4 (cron prefix) merged with upstream's wording — kept the 'do not create new cron reminders' note. - config/schema.py: removed consolidation_model field (upstream dropped in favor of provider_retry_mode + max_tool_result_chars + others). All 5 fork patches still apply or have been absorbed by upstream: 1. web_search → now config-driven (web_config.enable=false) 2. HTML unescape → still in shell.py (auto-merged) 3. Media validation → integrated into message.py 4. Cron prefix → wording merged 5. ${VAR} interpolation → loader.py auto-merged (upstream added own resolve_config_env_vars; ours runs at load time, theirs is opt-in) 6. reasoning_content/thinking_blocks → upstream landed reasoning_content; we kept thinking_blocks extension

Phase 2 of the bootstrap plan — extends nanobot's free-form Dream memory with the seven structured slots from CLAUDE.md (shop_profile, equipment, customers, materials, routing_memory, pricing_corrections, audit_log). Files added under foreman/: - memory/models.py: Pydantic models for the seven slots, with reversal fields on PricingCorrection (corrections are marked reversed, never edited). - memory/store.py: ForemanMemoryStore subclass of nanobot.MemoryStore. Per-slot CRUD with audit-log-on-write at the data layer — the audit entry cannot be skipped because it's emitted from inside every mutating method, not from the calling tool. Atomic writes via tmp+os.replace. - memory/resolver.py: customer-id resolver with confidence-gated escalation per CLAUDE.md non-negotiable HKUDS#4. Resolution chain: exact email_address match → unique-domain match → fuzzy display_name. Below 0.9 → never returns a match; always returns escalate=True with candidate list for owner pick. Reply-To-different-from-From triggers escalation even when both individually match (forwarded-RFQ scenario). - hooks/personality.py: PersonalityWriteHook for telemetry on personality- mutating tool calls. Observability only — the actual audit log lives at the data layer. - tests/test_store.py + tests/test_resolver.py: 29 tests covering CRUD round- trips, audit-log integrity (the non-negotiable), atomic-write guarantees, reversal flow, and every documented resolver scenario including the property-test that confidence < 0.9 NEVER returns a match. pyproject.toml updated: - packages = ["nanobot", "foreman"] - testpaths includes foreman/tests - coverage source includes foreman All 29 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Re-bin self-assigned this Feb 2, 2026

Re-bin merged commit 2049e1a into HKUDS:main Feb 2, 2026

ZhihaoZhang97 deleted the feature/vllm-support branch February 2, 2026 22:38

rankaiyx mentioned this pull request Feb 10, 2026

fix: correct API key environment variable for vLLM mode #42

Merged

orgoj pushed a commit to orgoj/nanobot that referenced this pull request Feb 13, 2026

Merge pull request HKUDS#4 from jhonny-cinco-ai/feature/routing-pr-v2

fa6e28f

feat(router): add CODING tier, per-tier secondary models, greeting patterns

StreamAzure pushed a commit to StreamAzure/nanobot_Theo that referenced this pull request Feb 18, 2026

Merge pull request HKUDS#4 from ZhihaoZhang97/feature/vllm-support

e4a002f

feat: add vLLM/local LLM support

nikolasdehor mentioned this pull request Feb 20, 2026

feat: Speech System #819

Open

ollie-dev-ops pushed a commit to mics8128/nanobot that referenced this pull request Feb 27, 2026

Merge pull request HKUDS#4 from wallyqs/patch-1

75e6cbf

README fix

JiajunBernoulli pushed a commit to JiajunBernoulli/nanobot that referenced this pull request Mar 15, 2026

Merge pull request HKUDS#4 from Good0007/dev

ca617ac

fix: cannot import name '_apply_patches'; update v0.1.6

WTHDonghai pushed a commit to WTHDonghai/nanobot that referenced this pull request Mar 22, 2026

Merge pull request HKUDS#4 from volcengine/dependabot/github_actions/…

da4c94b

…actions/setup-go-6 Bump actions/setup-go from 5 to 6

woodslinger mentioned this pull request Apr 7, 2026

exec /usr/local/bin/entrypoint.sh: no such file or directory #2878

Closed

dragosroua pushed a commit to dragosroua/aigernon that referenced this pull request Apr 13, 2026

Merge pull request HKUDS#4 from ZhihaoZhang97/feature/vllm-support

7e80930

feat: add vLLM/local LLM support

This was referenced Apr 20, 2026

fix(bus): bound inbound queue to prevent unbounded memory growth #3202

Closed

feat(agent): add ProfilingHook for opt-in iteration timing #3204

Closed

hussein1362 mentioned this pull request Apr 22, 2026

feat(heartbeat): add model override for heartbeat phases #3368

Open

Ydz0616 mentioned this pull request May 7, 2026

feat: per-identity MCPClientPool + email channel acting-as + LLM usage SSE SeekMi-Technologies/Ola_bot#1

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add vLLM/local LLM support#4

feat: add vLLM/local LLM support#4
Re-bin merged 1 commit intoHKUDS:mainfrom
ZhihaoZhang97:feature/vllm-support

ZhihaoZhang97 commented Feb 2, 2026 •

edited

Loading

Uh oh!

Re-bin commented Feb 2, 2026

Uh oh!

umb255-cloud commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ZhihaoZhang97 commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Usage

Testing

Uh oh!

Re-bin commented Feb 2, 2026

Uh oh!

umb255-cloud commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ZhihaoZhang97 commented Feb 2, 2026 •

edited

Loading