Skip to content

feat(heartbeat): add model override for heartbeat phases#3368

Open
hussein1362 wants to merge 1 commit intoHKUDS:mainfrom
hussein1362:feat/heartbeat-model-override
Open

feat(heartbeat): add model override for heartbeat phases#3368
hussein1362 wants to merge 1 commit intoHKUDS:mainfrom
hussein1362:feat/heartbeat-model-override

Conversation

@hussein1362
Copy link
Copy Markdown
Contributor

@hussein1362 hussein1362 commented Apr 21, 2026

Summary

Add gateway.heartbeat.model config option that lets operators run heartbeat on a different (typically cheaper) model than the agent's primary model. First-class implementation — no shared state mutation.

Motivation

Heartbeat runs are periodic background checks (email, calendar, signals) that don't need the full reasoning power of a flagship model. Currently, heartbeat always uses the agent's primary model — so if you run gpt-5.4 for chat, heartbeat also burns those tokens on routine checks.

This PR lets you decouple the two:

{
  "gateway": {
    "heartbeat": {
      "model": "anthropic/claude-haiku-3.5"
    }
  }
}

Chat stays on the primary model, heartbeat runs on a cheaper one.

Architecture

The model override flows cleanly through the call chain without mutating shared agent state:

process_direct(model_override=...)
  → _process_message(model_override=...)
    → _run_agent_loop(model_override=...)
      → AgentRunSpec(model=override or self.model)

This is request-scoped: agent.model is never touched, so concurrent message processing remains safe. Compare to the naive approach of temporarily swapping agent.model in a try/finally — that mutates shared state and creates a race window.

Phase 1 (decision): HeartbeatService stores the override as self.model and passes it to provider.chat_with_retry() directly.

Phase 2 (execution): The override passes through process_direct → _process_message → _run_agent_loop → AgentRunSpec without touching the agent's model field.

Changes

  • nanobot/config/schema.py — Add optional model: str | None to HeartbeatConfig (default: None)
  • nanobot/agent/loop.py — Add model_override: str | None = None parameter to process_direct, _process_message, and _run_agent_loop. AgentRunSpec receives model_override or self.model.
  • nanobot/heartbeat/service.py — Accept heartbeat_model param; use it for Phase 1 LLM calls
  • nanobot/cli/commands.py — Pass hb_cfg.model as model_override to process_direct (clean, no try/finally mutation)
  • tests/heartbeat/test_heartbeat_model_override.py — 7 tests: config defaults, service model selection, Phase 1 routing, process_direct signature validation

Behavior

Config Phase 1 (decide) Phase 2 (execute) agent.model
model not set Agent model Agent model Unchanged
model: "haiku" Haiku Haiku Unchanged ✅

Tests

$ python3.12 -m pytest tests/heartbeat/test_heartbeat_model_override.py -q
.......                                                                  [100%]
7 passed

$ python3.12 -m pytest tests/cli/test_commands.py tests/agent/test_unified_session.py -q
60 passed

$ ruff check nanobot/config/schema.py nanobot/heartbeat/service.py tests/heartbeat/
All checks passed!

@hussein1362 hussein1362 force-pushed the feat/heartbeat-model-override branch from 186642e to 826a463 Compare April 22, 2026 05:08
Add gateway.heartbeat.model config option that lets operators run
heartbeat on a different (typically cheaper) model than the agent's
primary model.

The override flows cleanly through the call chain without mutating
shared agent state:

  process_direct(model_override=...)
    → _process_message(model_override=...)
      → _run_agent_loop(model_override=...)
        → AgentRunSpec(model=override or self.model)

This is a first-class implementation: the model override is scoped to
the request and never touches agent.model, so concurrent message
processing (if ever enabled) remains safe.

Phase 1 (heartbeat decision) uses the override via HeartbeatService
which stores it as self.model for the provider.chat_with_retry call.
Phase 2 (agent execution) passes model_override through process_direct
to AgentRunSpec.

Config example:
  {
    "gateway": {
      "heartbeat": {
        "model": "anthropic/claude-haiku-3.5"
      }
    }
  }
@hussein1362 hussein1362 force-pushed the feat/heartbeat-model-override branch from 826a463 to 4ffaf47 Compare April 22, 2026 05:21
@chengyongru
Copy link
Copy Markdown
Collaborator

Thanks for the PR 💖 I'm curious about the use case — is there user feedback or a real scenario where heartbeat model cost is a concern? Also, the current approach threads model_override through _process_message and _run_agent_loop, which are core internal methods. This feels like a fairly heavy change for a single feature.

@hussein1362
Copy link
Copy Markdown
Contributor Author

Thanks for the thoughtful feedback! The numbers might help explain the motivation.

I run a gateway with multiple agents where heartbeat fires every 30 minutes. A single heartbeat run processes 94k tokens across 4 LLM calls — routine stuff like scanning inbox, checking calendar, running signal filters. On Sonnet that's $0.12/run → ~$177/month. On Opus it'd be **$886/month**. On Haiku, ~$44/month.

For most users, heartbeat is background housekeeping that doesn't need flagship reasoning. But right now there's no way to decouple it from the primary model — so anyone who wants a capable chat model is paying that same rate for "check if anything happened in the last 30 minutes." That's a real barrier to enabling heartbeat at all, especially for self-hosters watching their API bill.

On the implementation — I explored a few approaches before landing on this one:

  1. Direct LLM call for Phase 1 only — Heartbeat service makes its own provider call for the "should I act?" decision, bypassing the agent loop entirely. Lightest touch, but Phase 1 is just a short decision prompt — the real tokens are in Phase 2 (full tool loop with email/calendar/etc). This would save ~10% of almost nothing.

  2. Separate agent instance for heartbeat — Clone the agent with a different model at startup. Zero method signature changes. But it doubles agent memory footprint, and any mutable state (sessions, context) could diverge between the two instances.

  3. Context variable instead of explicit parameter — Fewer signature changes, but implicit thread-local state felt harder to reason about and debug than an explicit parameter.

  4. Explicit model_override parameter (current approach) — Request-scoped, no shared state mutation, no race window with concurrent messages. More lines touched, but the override flows cleanly through process_direct → _process_message → _run_agent_loop → AgentRunSpec without ever touching agent.model.

I went with #4 but if any of the others feel more aligned with how you'd want nanobot to handle per-request overrides, happy to rework it. Or if there's a pattern I missed entirely — even better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants