[Bug]: model.max_tokens in config.yaml has no effect — setting is never passed to AIAgent

## Bug Description

Setting `max_tokens` under the `model` key in `config.yaml` does not increase the output token limit. Responses are silently truncated mid-generation for moderately long tasks. The config value is read and merged but never extracted and forwarded to the AIAgent constructor, making the setting completely ineffective.

```yaml
model:
  default: MiniMax-M2.7
  provider: custom
  base_url: https://api.minimax.io/v1
  max_tokens: 8192  # ← this does nothing
```

## Steps to Reproduce

1. Set `model.max_tokens: 8192` in `config.yaml`
2. Run `hermes chat` and engage in a conversation requiring a long response
3. Observe the response gets truncated with no error message
4. Check the API request — `max_tokens` is not included

## Expected Behavior

`model.max_tokens` in `config.yaml` should be passed to the AIAgent constructor, which then sends it to the API.

## Actual Behavior

The AIAgent constructor accepts a `max_tokens` parameter ([run_agent.py#L660](https://github.com/NousResearch/hermes-agent/blob/main/run_agent.py#L660)), but callers never provide it:

- [cli.py#L2100](https://github.com/NousResearch/hermes-agent/blob/main/cli.py#L2100): `self.agent = AIAgent(...)` — called **without** `max_tokens`
- [gateway/run.py#L781-#L789](https://github.com/NousResearch/hermes-agent/blob/main/gateway/run.py#L781-L789): `_resolve_turn_agent_config()` builds a `primary` dict **without** `max_tokens`, so it never flows through to AIAgent

The `_build_api_kwargs` method ([run_agent.py#L4864](https://github.com/NousResearch/hermes-agent/blob/main/run_agent.py#L4864)) only adds `max_tokens` to the API request when `self.max_tokens is not None`:

```python
if self.max_tokens is not None:
    api_kwargs.update(self._max_tokens_param(self.max_tokens))
elif self._is_openrouter_url() and "claude" in (self.model or "").lower():
    # Band-aid: use hardcoded per-model limits
    _model_output_limit = _get_anthropic_max_output(self.model)
    api_kwargs["max_tokens"] = _model_output_limit
```

Since callers never pass `max_tokens`, it is always `None` and the parameter is never sent — except for a hardcoded band-aid for OpenRouter + Claude (see below).

## Root Cause Analysis (confirmed with live API intercept)

I patched the installed hermes-agent to log what `_build_api_kwargs()` sends to the API. Two tests were run:

**Test 1 — Current behavior (BUG):**
```
[DEBUG max_tokens] self.max_tokens=None | api_kwargs max_token keys={} | model=MiniMax-M2.7
🚨 BUG CONFIRMED: self.max_tokens is None AND no max_token in API request!
```

**Test 2 — With max_tokens=50 passed to AIAgent:**
```
[DEBUG max_tokens] self.max_tokens=50 | api_kwargs max_token keys={'max_tokens': 50} | model=MiniMax-M2.7
✅ FIX CONFIRMED: max_tokens IS being sent to the API!
```

**MiniMax API respects max_tokens** (verified with live API call):

| Setting | `completion_tokens` | `finish_reason` |
|---------|----------------------|-----------------|
| `max_tokens=50` | 50 | `length` (truncated) |
| No `max_tokens` (default) | 787 | `stop` (complete) |

With `max_tokens=50`, the API returned exactly 50 tokens and set `finish_reason=length`, confirming the parameter is respected.

## Note on OpenRouter + Claude Band-Aid

[run_agent.py#L4865](https://github.com/NousResearch/hermes-agent/blob/main/run_agent.py#L4865) has a hardcoded band-aid for OpenRouter + Claude because Anthropic's API requires `max_tokens`:

```python
elif self._is_openrouter_url() and "claude" in (self.model or "").lower():
    _model_output_limit = _get_anthropic_max_output(self.model)
    api_kwargs["max_tokens"] = _model_output_limit
```

This uses a hardcoded lookup table (`_get_anthropic_max_output`) and **ignores the user's `model.max_tokens` config entirely**. It was added as a workaround for the missing config passthrough, not a proper fix. Our fix makes config work for all providers including OpenRouter + Claude.

## Provider Compatibility

This fix is safe for all providers:

- **If `max_tokens` is NOT in config.yaml** → the fix extracts `None` → behavior is identical to before (no change)
- **If `max_tokens` IS in config.yaml** → user explicitly configured it for their provider/model
- Most OpenAI-compatible APIs ignore unsupported parameters rather than erroring
- The existing band-aid for OpenRouter + Claude confirms that passing `max_tokens` to providers that support it is the intended behavior

## Proposed Fix

Proof-of-concept on `fix/model-max-tokens-config` branch: https://github.com/shokollm/hermes-agent/tree/fix/model-max-tokens-config

Changes:
1. **cli.py**: Extract `max_tokens = CLI_CONFIG["model"].get("max_tokens")` and pass to AIAgent
2. **gateway/run.py**: Add `user_config` parameter to `_resolve_turn_agent_config()`, extract `max_tokens` from `user_config.get("model", {}).get("max_tokens")`, include it in the `primary` dict so it flows through to AIAgent

## Affected Component

- [x] CLI (interactive chat)
- [x] Gateway (Telegram/Discord/Slack/WhatsApp)
- [x] Configuration (config.yaml, .env, hermes setup)

## Are you willing to submit a PR for this?

- [x] I'd like to fix this myself and submit a PR

Setting	`completion_tokens`	`finish_reason`
`max_tokens=50`	50	`length` (truncated)
No `max_tokens` (default)	787	`stop` (complete)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: model.max_tokens in config.yaml has no effect — setting is never passed to AIAgent #4404

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Root Cause Analysis (confirmed with live API intercept)

Note on OpenRouter + Claude Band-Aid

Provider Compatibility

Proposed Fix

Affected Component

Are you willing to submit a PR for this?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: model.max_tokens in config.yaml has no effect — setting is never passed to AIAgent #4404

Description

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Root Cause Analysis (confirmed with live API intercept)

Note on OpenRouter + Claude Band-Aid

Provider Compatibility

Proposed Fix

Affected Component

Are you willing to submit a PR for this?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions