Skip to content

Local hallucinating 😓 #1068

@Claudioappassionato

Description

@Claudioappassionato

I’m asking for advice because there’s an issue with local models that I just can’t solve. Since it happens with all the models I’ve tested, I’m fairly sure it’s a configuration or code problem rather than something model-specific.

At first, the model works correctly. For a while, it executes tools properly, writes to files, and reads them without any issue. But after some conversation, instead of actually writing to the files, it only writes in the chat while behaving as if it had used write_file or read_file. The hardware side worked fine. What I had completely underestimated was context management.

The problem isn’t that local models are bad at long contexts. Qwen, on paper, supports 128,000 tokens. The issue is what happens to quality as that window fills up. Around 60–70% of capacity, the model starts to ignore information that was read earlier. It doesn’t fail dramatically; it simply and silently forgets the constraints set at the beginning of the prompt. You end up with output that looks plausible but does not satisfy requirements specified 10,000 tokens earlier.

I realized this because the pipeline was producing technically correct outputs, but they violated a formatting rule I had defined in the system prompt. It took me two days to understand that it wasn’t a logical error — the model simply could no longer “see” the beginning of its own context.

Is there a way to solve this issue so the local model doesn’t break down or start hallucinating?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions