Bug Description
Setting max_tokens under the model key in config.yaml does not increase the output token limit. Responses are silently truncated mid-generation for moderately long tasks. The config value is read and merged but never extracted and forwarded to the AIAgent constructor, making the setting completely ineffective.
model:
default: MiniMax-M2.7
provider: custom
base_url: https://api.minimax.io/v1
max_tokens: 8192 # ← this does nothing
Steps to Reproduce
- Set
model.max_tokens: 8192 in config.yaml
- Run
hermes chat and engage in a conversation requiring a long response
- Observe the response gets truncated with no error message
- Check the API request —
max_tokens is not included
Expected Behavior
model.max_tokens in config.yaml should be passed to the AIAgent constructor, which then sends it to the API.
Actual Behavior
The AIAgent constructor accepts a max_tokens parameter (run_agent.py#L660), but callers never provide it:
- cli.py#L2100:
self.agent = AIAgent(...) — called without max_tokens
- gateway/run.py#L781-#L789:
_resolve_turn_agent_config() builds a primary dict without max_tokens, so it never flows through to AIAgent
The _build_api_kwargs method (run_agent.py#L4864) only adds max_tokens to the API request when self.max_tokens is not None:
if self.max_tokens is not None:
api_kwargs.update(self._max_tokens_param(self.max_tokens))
elif self._is_openrouter_url() and "claude" in (self.model or "").lower():
# Band-aid: use hardcoded per-model limits
_model_output_limit = _get_anthropic_max_output(self.model)
api_kwargs["max_tokens"] = _model_output_limit
Since callers never pass max_tokens, it is always None and the parameter is never sent — except for a hardcoded band-aid for OpenRouter + Claude (see below).
Root Cause Analysis (confirmed with live API intercept)
I patched the installed hermes-agent to log what _build_api_kwargs() sends to the API. Two tests were run:
Test 1 — Current behavior (BUG):
[DEBUG max_tokens] self.max_tokens=None | api_kwargs max_token keys={} | model=MiniMax-M2.7
🚨 BUG CONFIRMED: self.max_tokens is None AND no max_token in API request!
Test 2 — With max_tokens=50 passed to AIAgent:
[DEBUG max_tokens] self.max_tokens=50 | api_kwargs max_token keys={'max_tokens': 50} | model=MiniMax-M2.7
✅ FIX CONFIRMED: max_tokens IS being sent to the API!
MiniMax API respects max_tokens (verified with live API call):
| Setting |
completion_tokens |
finish_reason |
max_tokens=50 |
50 |
length (truncated) |
No max_tokens (default) |
787 |
stop (complete) |
With max_tokens=50, the API returned exactly 50 tokens and set finish_reason=length, confirming the parameter is respected.
Note on OpenRouter + Claude Band-Aid
run_agent.py#L4865 has a hardcoded band-aid for OpenRouter + Claude because Anthropic's API requires max_tokens:
elif self._is_openrouter_url() and "claude" in (self.model or "").lower():
_model_output_limit = _get_anthropic_max_output(self.model)
api_kwargs["max_tokens"] = _model_output_limit
This uses a hardcoded lookup table (_get_anthropic_max_output) and ignores the user's model.max_tokens config entirely. It was added as a workaround for the missing config passthrough, not a proper fix. Our fix makes config work for all providers including OpenRouter + Claude.
Provider Compatibility
This fix is safe for all providers:
- If
max_tokens is NOT in config.yaml → the fix extracts None → behavior is identical to before (no change)
- If
max_tokens IS in config.yaml → user explicitly configured it for their provider/model
- Most OpenAI-compatible APIs ignore unsupported parameters rather than erroring
- The existing band-aid for OpenRouter + Claude confirms that passing
max_tokens to providers that support it is the intended behavior
Proposed Fix
Proof-of-concept on fix/model-max-tokens-config branch: https://github.com/shokollm/hermes-agent/tree/fix/model-max-tokens-config
Changes:
- cli.py: Extract
max_tokens = CLI_CONFIG["model"].get("max_tokens") and pass to AIAgent
- gateway/run.py: Add
user_config parameter to _resolve_turn_agent_config(), extract max_tokens from user_config.get("model", {}).get("max_tokens"), include it in the primary dict so it flows through to AIAgent
Affected Component
Are you willing to submit a PR for this?
Bug Description
Setting
max_tokensunder themodelkey inconfig.yamldoes not increase the output token limit. Responses are silently truncated mid-generation for moderately long tasks. The config value is read and merged but never extracted and forwarded to the AIAgent constructor, making the setting completely ineffective.Steps to Reproduce
model.max_tokens: 8192inconfig.yamlhermes chatand engage in a conversation requiring a long responsemax_tokensis not includedExpected Behavior
model.max_tokensinconfig.yamlshould be passed to the AIAgent constructor, which then sends it to the API.Actual Behavior
The AIAgent constructor accepts a
max_tokensparameter (run_agent.py#L660), but callers never provide it:self.agent = AIAgent(...)— called withoutmax_tokens_resolve_turn_agent_config()builds aprimarydict withoutmax_tokens, so it never flows through to AIAgentThe
_build_api_kwargsmethod (run_agent.py#L4864) only addsmax_tokensto the API request whenself.max_tokens is not None:Since callers never pass
max_tokens, it is alwaysNoneand the parameter is never sent — except for a hardcoded band-aid for OpenRouter + Claude (see below).Root Cause Analysis (confirmed with live API intercept)
I patched the installed hermes-agent to log what
_build_api_kwargs()sends to the API. Two tests were run:Test 1 — Current behavior (BUG):
Test 2 — With max_tokens=50 passed to AIAgent:
MiniMax API respects max_tokens (verified with live API call):
completion_tokensfinish_reasonmax_tokens=50length(truncated)max_tokens(default)stop(complete)With
max_tokens=50, the API returned exactly 50 tokens and setfinish_reason=length, confirming the parameter is respected.Note on OpenRouter + Claude Band-Aid
run_agent.py#L4865 has a hardcoded band-aid for OpenRouter + Claude because Anthropic's API requires
max_tokens:This uses a hardcoded lookup table (
_get_anthropic_max_output) and ignores the user'smodel.max_tokensconfig entirely. It was added as a workaround for the missing config passthrough, not a proper fix. Our fix makes config work for all providers including OpenRouter + Claude.Provider Compatibility
This fix is safe for all providers:
max_tokensis NOT in config.yaml → the fix extractsNone→ behavior is identical to before (no change)max_tokensIS in config.yaml → user explicitly configured it for their provider/modelmax_tokensto providers that support it is the intended behaviorProposed Fix
Proof-of-concept on
fix/model-max-tokens-configbranch: https://github.com/shokollm/hermes-agent/tree/fix/model-max-tokens-configChanges:
max_tokens = CLI_CONFIG["model"].get("max_tokens")and pass to AIAgentuser_configparameter to_resolve_turn_agent_config(), extractmax_tokensfromuser_config.get("model", {}).get("max_tokens"), include it in theprimarydict so it flows through to AIAgentAffected Component
Are you willing to submit a PR for this?