Python sample code that runs TranslateGemma on Intel devices (CPU, GPU, NPU) by using OpenVINO GenAI pipeline.
Files in this repo:
translate.pythe main pipeline. (modified fromvisual_language_chat.py)export-requirements.txtrequired Python packages for model export. (download link)deployment-requirements.txtrequired Python packages for model deployment. (download link)chat_template-gemma3.jsonGemma3 chat_template used to workaround the validation error of OV GenAI VLM pipeline (download link)text_en.txtan English text file used to test text translation. (source)text_zh-TW.txta (Traditional) Chinese text file used to test text translation. (source)image_cs.jpgan image that contains Czech characters used to test image translation. (source)image_en.pngan image that contains English characters used to test image translation. (source)
Input the following command to install required packages for model export. The --upgrade-strategy eager option is needed to ensure optimum-intel is upgraded to the latest version.
pip install --upgrade-strategy eager -r export-requirements.txtThe script needs to download models from Hugging Face. To get the access, please visit https://huggingface.co/google/translategemma-4b-it then login (by hitting log in)
Make sure your access token has been prepared. Make sure huggingface-cli has been installed. Open a Command Prompt, run huggingface-cli login with your access token
pip install "huggingface_hub[cli]<1.0,>=0.34.0"
huggingface-cli login
- Transformers 4.55.4 requires huggingface-hub<1.0,>=0.34.0
Then, run the export with Optimum CLI:
optimum-cli export openvino --model google/translategemma-4b-it --trust-remote-code translategemma-4b-it- Models will be exported under model_dir (
translategemma-4b-itin this example) - The argument
--weight-formatcan be used to quantize the model. See Quantization for the detail
Input the following command to install required packages for model deployment.
pip install -r deployment-requirements.txt
translate.py --model_dir MODEL_DIR
--text TEXT
--image IMAGE
--device {CPU,GPU,NPU}
--source_lang_code SOURCE_LANG_CODE
--target_lang_code TARGET_LANG_CODE
- The arguments
--model_dir,--source_lang_codeand-target_lang_codeare required: - Either
--text TEXTor--image IMAGEshould be provided - The
--devicecan beCPU,GPUorNPU - Language code examples:
en,en-GB,zhorzh-TW. Full language code can be foundhereor checkchat_template.jinjalocally under model_dir
Command:
python translate.py --model_dir translategemma-4b-it --device GPU --source_lang_code zh-TW --target_lang_code en --text text_zh-TW.txt
Result:
Input:
白日依山盡,黃河入海流;欲窮千里目,更上一層樓。
Output:
As the sun sets behind the mountains, the Yellow River flows into the sea. To gain a broader perspective, one must climb to a higher vantage point.
Command:
python translate.py --model_dir translategemma-4b-it --device GPU --source_lang_code cs --target_lang_code en --image image_cs.jpg
Output:
Pedestrian Zone
Child Supervision
IZS, CBS in Supervision
0 - 24 hours
When exporting a model, the argument --weight-format can be used to quantize the model. The supported weights are int8, int4 and nf4. Please visit OpenVINO model preparation guide for the detail.
optimum-cli export openvino --model google/translategemma-4b-it --trust-remote-code --weight-format int8 translategemma-4b-it_int8The pipeline is verified on a Intel(R) Core(TM) Ultra 5 238V (Lunar Lake) system with 32GB memory. GPU/NPU driver info below
GPU: Intel(R) Arc(TM) 130V GPU, driver 32.0.101.8425 (1/16/2026)NPU: Intel(R) AI Boost, driver 32.0.100.4514 (12/17/2025)
| Model | weight | CPU | GPU | NPU |
|---|---|---|---|---|
| translategemma-4b-it | fp16 | OK | OK | OK |
| int8 | OK | OK | OK | |
| int4 | OK | OK | OK(1) | |
| nf4 | OK | OK | OK(1) | |
| translategemma-12b-it | int8 | OK | OK | NG(2) |
| int4 | OK | OK | NG(3) | |
| nf4 | OK | OK | NG(4) |
- (1) To run
int4ornf4models onNPU, below argumetns are required when exporting the model. See LLM Inference on NPU for the detail- for
int4:--weight-format int4 --sym --ratio 1.0 --group-size 128 - for
nf4:--weight-format nf4 --sym --ratio 1.0 --group-size -1
- for
- (2) Failed due to insufficient memory, see
log.txtfor the detail - (3) Output is garbage, see
log.txtfor the detail - (4) No output, see
log.txtfor the detail
Full log log.txt is provided for reference
