Skip to content

[Feature]: Local-First Mode for Small Models — Compact No-Tools Prompting, Strict Parser Option, and No Prompt-Leakage #5287

@ThirDecade2020

Description

@ThirDecade2020

Summary

ZeroClaw would benefit from a compact local-model mode that reduces prompt bloat, disables permissive fallback parsing, and prevents internal tool/system instructions from leaking into user-visible output.

Problem statement

It solves a real local-first user pain: when someone uses ZeroClaw with a smaller Ollama-hosted model, they want the agent to respond cleanly and cheaply on simple supervised tasks, but the current runtime adds too much prompt overhead and too many permissive fallback behaviors, which can lead to slower responses, hangs, bogus tool behavior, and even internal tool/system text leaking into the final answer.

Current behavior is insufficient because:
• trivial no-tools prompts still carry a large tool/system preamble
• that wastes context and inference budget on local models
• fallback parsing is loose enough to create instability
• internal runtime text can appear in user-visible output
• this makes local models feel less reliable than they should in the exact local-first workflows ZeroClaw could otherwise serve well

Proposed solution

Preferred behavior:
• when ZeroClaw is running against a local/smaller model, it should offer a compact local-model mode that minimizes prompt overhead and keeps no-tools turns simple, fast, and deterministic
• plain-response turns should not include large tool-policy blocks unless tool use is actually enabled for that turn
• internal truncation markers and tool/system scaffolding should never be visible to the model in a way that can leak back to the user
• fallback parsing should be optionally strict, so only native/explicit tool calls are honored and permissive text-to-tool inference is disabled

Preferred interfaces:
• a config flag such as local_compact_mode = true
• a parser setting such as strict_tool_parsing = true
• an option like suppress_tool_instructions_when_no_tools = true
• a guarantee that truncation markers are handled internally, not inserted into model-visible prompt text
• optionally, a built-in preset like runtime_profile = "ollama_local" for smaller local models

Non-goals / out of scope

No response

Alternatives considered

No response

Acceptance criteria

No response

Architecture impact

No response

Risk and rollback

No response

Breaking change?

No

Data hygiene checks

  • I removed personal/sensitive data from examples, payloads, and logs.
  • I used neutral, project-focused wording and placeholders.

Metadata

Metadata

Assignees

Labels

agentAuto scope: src/agent/** changed.enhancementNew feature or requestpriority:p2Medium priorityproviderAuto scope: src/providers/** changed.provider: ollamaAuto module: provider/ollama changed.risk: highAuto risk: security/runtime/gateway/tools/workflows.runtimeAuto scope: src/runtime/** changed.securityAuto scope: src/security/** changed.status:acceptedRFC or work item accepted and ratified by the team.status:no-staleExempt from the 60-day stale auto-close policy.toolAuto scope: src/tools/** changed.

Type

No type

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions