mlx backend: degenerate looping output with gemma-4 E4B (chat template apparently not applied)

**LocalAI version:** 4.4.1 (Homebrew bottle)
**Environment:** macOS (Apple Silicon, M1 Max), darwin/arm64
**Backend:** `mlx` (darwin image installs fine)

### What happened

Serving `mlx-community/gemma-4-E4B-it-qat-4bit` through the `mlx` backend produces degenerate, looping output that echoes the prompt — the chat template does not seem to be applied.

Model config:

```yaml
name: gemma-qat-mlx
backend: mlx
parameters:
  model: mlx-community/gemma-4-E4B-it-qat-4bit
```

Request:

```bash
curl http://127.0.0.1:1240/v1/chat/completions -H "Content-Type: application/json" \
  -d '{"model":"gemma-qat-mlx","messages":[{"role":"user","content":"Reply with exactly: MLX inside LocalAI works"}],"max_tokens":30}'
```

Response content:

```
 exactly: MLX inside LocalAI works exactly: MLX inside LocalAI works exactly: MLX inside Local
```

— prompt-fragment echo repeated until `max_tokens`, on every request.

### Expected

The same checkpoint served by `mlx_vlm.server` (mlx-vlm 0.6.2) on the same machine replies correctly ("MLX inside LocalAI works"), so the weights and the MLX runtime are fine — the difference points at prompt/chat-template handling in the LocalAI mlx backend (gemma-4 E4B is a vision-language architecture; possibly the backend applies a plain-LM template or none at all).

### Secondary observation

Cold load through the backend took ~82 s for this model; `mlx_vlm.server` loads the same checkpoint from the same HF cache in ~10 s. Worth a look once the output issue is addressed.

Happy to provide debug logs or test patches on this machine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mlx backend: degenerate looping output with gemma-4 E4B (chat template apparently not applied) #10269

What happened

Expected

Secondary observation

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

mlx backend: degenerate looping output with gemma-4 E4B (chat template apparently not applied) #10269

Description

What happened

Expected

Secondary observation

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions