Hi everyone!
Gemma 4 models allow for a variable token count for image representation. (70, 140, 280, 560, and 1120)
280 is the default value and it is not enough for detail-oriented tasks such as OCR.
In HF transformers this can be adjusted with:
AutoProcessor.from_pretrained(MODEL_ID, max_soft_tokens=1120)
In llama-server there exists a switch:
--image-max-tokens 1120
Ollama has a somewhat shady hack for it. (ollama/ollama#15626 (comment)), so solving the problem in a wrapper should be fine.
I am personally not needing a GUI element for this setting. It would be fine to have a key in model.yaml that can hold an overwite value and passes it to the inferencer if it is present.
Thank you for considering that addition!
Best regards!
Hi everyone!
Gemma 4 models allow for a variable token count for image representation. (70, 140, 280, 560, and 1120)
280 is the default value and it is not enough for detail-oriented tasks such as OCR.
In HF transformers this can be adjusted with:
AutoProcessor.from_pretrained(MODEL_ID, max_soft_tokens=1120)In
llama-serverthere exists a switch:--image-max-tokens 1120Ollama has a somewhat shady hack for it. (ollama/ollama#15626 (comment)), so solving the problem in a wrapper should be fine.
I am personally not needing a GUI element for this setting. It would be fine to have a key in model.yaml that can hold an overwite value and passes it to the inferencer if it is present.
Thank you for considering that addition!
Best regards!