Skip to content

feat: add vLLM/local LLM support#4

Merged
Re-bin merged 1 commit intoHKUDS:mainfrom
ZhihaoZhang97:feature/vllm-support
Feb 2, 2026
Merged

feat: add vLLM/local LLM support#4
Re-bin merged 1 commit intoHKUDS:mainfrom
ZhihaoZhang97:feature/vllm-support

Conversation

@ZhihaoZhang97
Copy link
Copy Markdown
Contributor

@ZhihaoZhang97 ZhihaoZhang97 commented Feb 2, 2026

Summary

This PR adds support for vLLM and other OpenAI-compatible local LLM endpoints.

Changes

  • config/schema.py: Added vllm provider configuration
  • providers/litellm_provider.py: Auto-detect vLLM endpoints, use hosted_vllm/ prefix for LiteLLM
  • cli/commands.py: Display vLLM status in nanobot status command
  • README.md: Added vLLM setup documentation

Usage

{
  "providers": {
    "vllm": {
      "apiKey": "dummy",
      "apiBase": "http://your-vllm-server:8000/v1"
    }
  },
  "agents": {
    "defaults": {
      "model": "your-model-name"
    }
  }
}

Testing

Tested with vLLM server running gpt-oss-120b model.

- Add vllm provider configuration in config schema
- Auto-detect vLLM endpoints and use hosted_vllm/ prefix for LiteLLM
- Pass api_base directly to acompletion for custom endpoints
- Add vLLM status display in CLI status command
- Add vLLM setup documentation in README
@Re-bin Re-bin self-assigned this Feb 2, 2026
@Re-bin
Copy link
Copy Markdown
Collaborator

Re-bin commented Feb 2, 2026

Hi ZhihaoZhang97,

This is great! Thanks for the PR :)

Best regards,
Xubin

@Re-bin Re-bin merged commit 2049e1a into HKUDS:main Feb 2, 2026
@ZhihaoZhang97 ZhihaoZhang97 deleted the feature/vllm-support branch February 2, 2026 22:38
@umb255-cloud
Copy link
Copy Markdown

这里不应该指定 vllm 应该以API通讯方式为主 设置 OpenAI-compatible 这样所有支持API的服务都能调用 比如ollama 和lm studio agents设置的 model-name 可以是* 这样不管后端换什么模型都不用改配置 用OpenAI-compatible通讯 前段也不用管模型是什么 而且OpenAI-compatible也支持命令加载某个模型或切换模型

anchapin added a commit to anchapin/nanobot that referenced this pull request Feb 6, 2026
…support

This commit fixes 6 categories of issues identified during code review:

**Security Fixes (Task HKUDS#1, HKUDS#2):**
- Fix LiteLLMProvider API key race condition by passing api_key directly
  to litellm instead of modifying os.environ (prevents credential leakage)
- Fix RateLimiter defaultdict thread-safety issue by using explicit dict.get()
- Add TTL-based cleanup to RateLimiter (max_age_seconds, max_entries)
  to prevent memory exhaustion DoS from unbounded user ID growth

**Resource Management (Task HKUDS#3, HKUDS#4):**
- Implement ProcessRegistry for tracking spawned ffmpeg/ffprobe processes
- Add signal handlers (SIGTERM, SIGINT) for graceful process cleanup
- Make video processing timeouts configurable (frame, audio, info)
- Add timeout-safe process.wait() after process.kill()
- Implement periodic background cleanup for media files
- Add signal handlers to MediaCleanupRegistry for reliable cleanup
- Add thread-safe file registration with get_stats() monitoring

**Configuration Improvements (Task HKUDS#5):**
- Make TTS model, max_text_length, and timeout configurable
- Add validation for TTS config parameters (model, max_length, timeout)
- Improve error messages with text preview for debugging
- Return tuple[bool, str | None] from synthesize() to warn about truncation
- Update TelegramChannel to handle TTS truncation warnings

**Code Quality (Task HKUDS#6):**
- Refactor rate limiters to use factory functions (cleaner API)
- Keep backwards-compatible class aliases marked as deprecated
- Update imports to use factory functions (tts_rate_limiter, etc.)

**Documentation:**
- Add CODE_REVIEW_ISSUES.md with detailed analysis and fix summary
- Update CLAUDE.md with new multi-modal patterns and utilities

**Testing:**
- Add tests for rate limiter cleanup functionality
- All 35 tests passing

Core line count: 4,833 lines (security/reliability improvements added ~478 lines)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
orgoj pushed a commit to orgoj/nanobot that referenced this pull request Feb 13, 2026
feat(router): add CODING tier, per-tier secondary models, greeting patterns
StreamAzure pushed a commit to StreamAzure/nanobot_Theo that referenced this pull request Feb 18, 2026
KinglittleQ pushed a commit to KinglittleQ/nanobot that referenced this pull request Feb 19, 2026
## Fixes (from subagent code review)

### P0: Subagent tool context isolation (HKUDS#1)
- Subagent now calls set_tool_context() before executing tools
- Prevents message routing to wrong session in concurrent scenarios

### P0: Atomic session writes (HKUDS#2)
- Session save() now writes to temp file then os.replace() (atomic on POSIX)
- Crash mid-write no longer corrupts/empties session file

### P1: Consolidation saves last_consolidated (HKUDS#3)
- After successful consolidation, session is saved to persist the pointer
- On failure, still advances pointer to avoid retrying same messages

### P1: admin.json mtime cache (HKUDS#4)
- _load_admin_config() now caches by file mtime
- Avoids re-reading disk on every tool call in high-frequency scenarios

### P1: Subagent registry atomic write (HKUDS#5)
- _save_registry() uses temp file + os.replace() like session save

### P2: LLM error responses not saved to session (HKUDS#17)
- When finish_reason == "error", the error is returned to user but NOT
  added to session history, preventing context pollution

### P2: Subagent timeout protection (HKUDS#15)
- Max iterations reduced from 500 to 200
- Added 30-minute wall-clock timeout to prevent runaway tasks

### P2: SSRF protection for web_fetch (HKUDS#10)
- _validate_url() now blocks private/loopback/link-local/reserved IPs

### P3: ReadFileTool binary file handling (HKUDS#13)
- Catches UnicodeDecodeError and returns friendly message instead of crash
KinglittleQ pushed a commit to KinglittleQ/nanobot that referenced this pull request Feb 26, 2026
HKUDS#1 - Move MODEL_CONTEXT_WINDOWS/MODEL_PRICING to registry.py
     Add get_context_window() and get_pricing() module-level helpers
     Remove duplicate dicts from loop.py

HKUDS#2 - Extract _build_status() into module-level _build_status_report()
     Pure function with no AgentLoop dependency; 80 lines → 12 lines in class

HKUDS#3 - Extract on_cron_job closure into _make_cron_job_handler()
     gateway() is now ~50 lines shorter and easier to read

HKUDS#4 - Extract _build_tools() factory function shared by AgentLoop + SubagentManager
     Eliminates ~15 lines of duplicated tool registration in subagent.py

HKUDS#8 - Replace 6-tuple return from _run_agent_loop() with LoopResult dataclass
     Named fields instead of positional unpacking

Also:
- Move _strip_think/_tool_hint/_format_tool_detail to module level
- Remove verbose=False branch from _format_tool_detail (unused)
- Remove ToolRegistry import from subagent.py (no longer needed directly)
ollie-dev-ops pushed a commit to mics8128/nanobot that referenced this pull request Feb 27, 2026
JiajunBernoulli pushed a commit to JiajunBernoulli/nanobot that referenced this pull request Mar 15, 2026
fix: cannot import name '_apply_patches'; update v0.1.6
WTHDonghai pushed a commit to WTHDonghai/nanobot that referenced this pull request Mar 22, 2026
…actions/setup-go-6

Bump actions/setup-go from 5 to 6
LeslieMiau added a commit to LeslieMiau/nanobot that referenced this pull request Mar 29, 2026
Add a user-facing CLI entrypoint for creating persisted coding tasks and reuse the shared coding-task runtime loader across gateway and CLI flows.

Update harness state to mark feature HKUDS#4 complete and record the verification checkpoint.

Co-authored-by: Codex <noreply@openai.com>
weitongtong added a commit to weitongtong/nanobot that referenced this pull request Apr 11, 2026
支持更新已有定时任务的名称、调度计划、消息内容、投递配置等可变字段。
系统任务(system_event)受保护不可编辑。包含完整的单元测试覆盖。

Made-with: Cursor

Co-authored-by: weitongtong <tongtong.wei@nodeskai.com>
dragosroua pushed a commit to dragosroua/aigernon that referenced this pull request Apr 13, 2026
mohamed-elkholy95 added a commit to mohamed-elkholy95/nanobot that referenced this pull request Apr 20, 2026
Remove the two blank-line-after-import additions in transcribe_audio
that slipped in with the bus-bounding change. Same class of unrelated
formatting churn the reviewer flagged in point HKUDS#4; now the PR-vs-base
diff for base.py contains only the functional drop-feedback branch.
liflovs added a commit to liflovs/nanobot that referenced this pull request Apr 29, 2026
Resolves conflicts in:
- session/manager.py: kept upstream's media-breadcrumb + timestamp logic,
  added thinking_blocks to history allowlist (upstream had reasoning_content,
  we extended). Patch HKUDS#6 effectively absorbed.
- agent/loop.py: dropped consolidation_model param (upstream removed),
  took upstream's expanded constructor signature, ExecTool sandbox/
  allowed_env_keys, conditional WebSearch/WebFetch via web_config.enable.
  Patch HKUDS#1 (disable web_search) now achieved via config not code.
- agent/tools/message.py: combined upstream's path resolution + metadata
  with our patch HKUDS#3 (zero-byte/missing media validation), running
  validation after path resolution.
- cli/commands.py: dropped consolidation_model arg from agent ctor,
  took upstream's expanded args. Patch HKUDS#4 (cron prefix) merged with
  upstream's wording — kept the 'do not create new cron reminders' note.
- config/schema.py: removed consolidation_model field (upstream dropped
  in favor of provider_retry_mode + max_tool_result_chars + others).

All 5 fork patches still apply or have been absorbed by upstream:
1. web_search → now config-driven (web_config.enable=false)
2. HTML unescape → still in shell.py (auto-merged)
3. Media validation → integrated into message.py
4. Cron prefix → wording merged
5. ${VAR} interpolation → loader.py auto-merged (upstream added own
   resolve_config_env_vars; ours runs at load time, theirs is opt-in)
6. reasoning_content/thinking_blocks → upstream landed reasoning_content;
   we kept thinking_blocks extension
ctmmit pushed a commit to ctmmit/foreman that referenced this pull request Apr 29, 2026
Phase 2 of the bootstrap plan — extends nanobot's free-form Dream memory with
the seven structured slots from CLAUDE.md (shop_profile, equipment, customers,
materials, routing_memory, pricing_corrections, audit_log).

Files added under foreman/:
- memory/models.py: Pydantic models for the seven slots, with reversal fields
  on PricingCorrection (corrections are marked reversed, never edited).
- memory/store.py: ForemanMemoryStore subclass of nanobot.MemoryStore.
  Per-slot CRUD with audit-log-on-write at the data layer — the audit entry
  cannot be skipped because it's emitted from inside every mutating method,
  not from the calling tool. Atomic writes via tmp+os.replace.
- memory/resolver.py: customer-id resolver with confidence-gated escalation
  per CLAUDE.md non-negotiable HKUDS#4. Resolution chain: exact email_address
  match → unique-domain match → fuzzy display_name. Below 0.9 → never returns
  a match; always returns escalate=True with candidate list for owner pick.
  Reply-To-different-from-From triggers escalation even when both individually
  match (forwarded-RFQ scenario).
- hooks/personality.py: PersonalityWriteHook for telemetry on personality-
  mutating tool calls. Observability only — the actual audit log lives at
  the data layer.
- tests/test_store.py + tests/test_resolver.py: 29 tests covering CRUD round-
  trips, audit-log integrity (the non-negotiable), atomic-write guarantees,
  reversal flow, and every documented resolver scenario including the
  property-test that confidence < 0.9 NEVER returns a match.

pyproject.toml updated:
- packages = ["nanobot", "foreman"]
- testpaths includes foreman/tests
- coverage source includes foreman

All 29 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants