You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Test #20 (decoy: certification reimbursement) triggers hallucination where the model fabricates a policy description and document ID without ever calling the search or fetch tools.
Observed behavior
Turn 1: No valid code produced (parse failure)
Turn 2: Model returns fabricated text:
"The policy for professional certification reimbursement, as detailed
in document pol-883, states that employees are eligible for full
reimbursement of certification fees..."
The document ID "pol-883" does not exist. The model never called tool/search or tool/fetch. It hallucinated the entire answer.
Expected behavior
The model should search for certification/reimbursement documents, fetch candidates, compare content, and return "DOC-021" (Certification Reimbursement document).
Root cause analysis
The system prompt says "You have NO direct data access" (coordinator mode) or shows data inventory (direct mode), but the model sometimes ignores tools and generates plausible-sounding text answers. This is worse in auto-return mode because the fabricated text has no println → auto-returns as the answer.
In explicit multi-turn mode, the model would need to wrap the fabrication in (return ...), which provides a slight friction barrier. But the core issue is model behavior, not prompt mechanics.
Reproduction
cd demo &&source .env
mix run -e 'PtcDemo.LispTestRunner.run_one(20, model: "openrouter:google/gemini-3.1-flash-lite-preview", prompt: :auto_return, verbose: true, debug: true)'
Occurs intermittently — approximately 1 in 3 runs.
Potential mitigations
Prompt addition: "You MUST use tools to access data. Never fabricate or guess answers."
Validation: Detect when no tool calls were made but a non-trivial answer was returned
Signature constraint: Force DOC-XXX format pattern in the expected output
Problem
Test #20 (decoy: certification reimbursement) triggers hallucination where the model fabricates a policy description and document ID without ever calling the search or fetch tools.
Observed behavior
The document ID "pol-883" does not exist. The model never called
tool/searchortool/fetch. It hallucinated the entire answer.Expected behavior
The model should search for certification/reimbursement documents, fetch candidates, compare content, and return "DOC-021" (Certification Reimbursement document).
Root cause analysis
The system prompt says "You have NO direct data access" (coordinator mode) or shows data inventory (direct mode), but the model sometimes ignores tools and generates plausible-sounding text answers. This is worse in auto-return mode because the fabricated text has no println → auto-returns as the answer.
In explicit multi-turn mode, the model would need to wrap the fabrication in
(return ...), which provides a slight friction barrier. But the core issue is model behavior, not prompt mechanics.Reproduction
Occurs intermittently — approximately 1 in 3 runs.
Potential mitigations
DOC-XXXformat pattern in the expected outputAffects
demo/lib/ptc_demo/test_runner/test_case.exauto_returnandmulti_turnmodes (untested on multi_turn for this specific test)