Bug: model hallucination on exploration test #20 — bypasses tools entirely

## Problem

Test #20 (decoy: certification reimbursement) triggers hallucination where the model fabricates a policy description and document ID without ever calling the search or fetch tools.

## Observed behavior

```
Turn 1: No valid code produced (parse failure)
Turn 2: Model returns fabricated text:
  "The policy for professional certification reimbursement, as detailed 
   in document pol-883, states that employees are eligible for full 
   reimbursement of certification fees..."
```

The document ID "pol-883" does not exist. The model never called `tool/search` or `tool/fetch`. It hallucinated the entire answer.

## Expected behavior

The model should search for certification/reimbursement documents, fetch candidates, compare content, and return "DOC-021" (Certification Reimbursement document).

## Root cause analysis

The system prompt says "You have NO direct data access" (coordinator mode) or shows data inventory (direct mode), but the model sometimes ignores tools and generates plausible-sounding text answers. This is worse in auto-return mode because the fabricated text has no println → auto-returns as the answer.

In explicit multi-turn mode, the model would need to wrap the fabrication in `(return ...)`, which provides a slight friction barrier. But the core issue is model behavior, not prompt mechanics.

## Reproduction

```bash
cd demo && source .env
mix run -e 'PtcDemo.LispTestRunner.run_one(20, model: "openrouter:google/gemini-3.1-flash-lite-preview", prompt: :auto_return, verbose: true, debug: true)'
```

Occurs intermittently — approximately 1 in 3 runs.

## Potential mitigations

1. **Prompt addition:** "You MUST use tools to access data. Never fabricate or guess answers."
2. **Validation:** Detect when no tool calls were made but a non-trivial answer was returned
3. **Signature constraint:** Force `DOC-XXX` format pattern in the expected output
4. **Model capability:** May only affect the lite model — test on stronger models (#834)

## Affects

- Test #20 in `demo/lib/ptc_demo/test_runner/test_case.ex`
- Both `auto_return` and `multi_turn` modes (untested on multi_turn for this specific test)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: model hallucination on exploration test #20 — bypasses tools entirely #836

Problem

Observed behavior

Expected behavior

Root cause analysis

Reproduction

Potential mitigations

Affects

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug: model hallucination on exploration test #20 — bypasses tools entirely #836

Description

Problem

Observed behavior

Expected behavior

Root cause analysis

Reproduction

Potential mitigations

Affects

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions