Configuration of the Variable Image Resolution in Gemma 4

Hi everyone!

Gemma 4 models allow for a variable token count for image representation. (70, 140, 280, 560, and 1120)

280 is the default value and it is not enough for detail-oriented tasks such as OCR.

In HF transformers this can be adjusted with:
`AutoProcessor.from_pretrained(MODEL_ID, max_soft_tokens=1120)`

In `llama-server` there exists a switch:
`--image-max-tokens 1120 `

Ollama has a somewhat shady hack for it. (https://github.com/ollama/ollama/issues/15626#issuecomment-4459260672), so solving the problem in a wrapper should be fine.

I am personally not needing a GUI element for this setting. It would be fine to have a key in model.yaml that can hold an overwite value and passes it to the inferencer if it is present.

Thank you for considering that addition!

Best regards!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Configuration of the Variable Image Resolution in Gemma 4 #591

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Configuration of the Variable Image Resolution in Gemma 4 #591

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions