Arm backend: Add evaluate_model.py by martinlsm · Pull Request #18199 · pytorch/executorch

martinlsm · 2026-03-16T15:18:46Z

Arm backend: Add evaluate_model.py

This patch reimplements the evaluation feature that used to be in
aot_arm_compiler.py while introducing a few improvements. The program is
evaluate_model.py and it imports functions from aot_arm_compiler.py to
compile a model in a similar manner, but runs its own code that is
focused on evaluating a model using the evaluators classes in
backends/arm/util/arm_model_evaluator.py.

The following is supported in evaluate_model.py:

TOSA reference models (INT, FP).
Evaluating a model that is quantized and/or lowered.
I.e., it is possible to evaluate a model that is quantized but not
lowered, lowered but not quantized, or both at the same time.
The program can cast the model with the --dtype flag to evaluate a
model in e.g., bf16 or fp16 format.

Also add tests that exercise evaluate_model.py with different command
line arguments.

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell

pytorch-bot · 2026-03-16T15:18:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18199

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures

As of commit d7219c7 with merge base c81126e ():

NEW FAILURES - The following jobs have failed:

pull / unittest-arm-backend-with-no-deps (test_pytest_ops_no_target) / linux-job (gh)
RuntimeError: Command docker exec -t d0b47fc47f2d0dce02c0a4f76842ae5f46d35a9e6ae3442eef44c46e3f0befa7 /exec failed with exit code 1
pull / unittest-arm-backend-with-no-deps (test_pytest_ops_tosa) / linux-job (gh)
RuntimeError: Command docker exec -t 47cffbdf99d0b4f13b0fdc3fa7600f159d33392f93c96e0b8c98af61900767df /exec failed with exit code 1
trunk / test-arm-backend-ethos-u (test_pytest_ops_ethos_u55) / linux-job (gh)
RuntimeError: Command docker exec -t b57c73ba409bd257c09cc035829a058a66b3c8bce8853fa07e4daa496b65f24c /exec failed with exit code 1
trunk / test-arm-backend-ethos-u (test_pytest_ops_ethos_u85) / linux-job (gh)
RuntimeError: Command docker exec -t fad5613ef93e19ae985b694eee9d77bb13e59ecd1afda8bcb077399e8a674463 /exec failed with exit code 1
trunk / test-arm-backend-vkml (test_pytest_ops_vkml) / linux-job (gh)
RuntimeError: Command docker exec -t e63f76d9eae5b32205450f394a54ce0ed1e11531f1671d6b50a83c92a5810360 /exec failed with exit code 1
trunk / test-mcu-cortex-m-backend / linux-job (gh)
RuntimeError: Command docker exec -t 7ffe83bf9934f0d752060183ae064334ba59a3cfafb2b148b5599297531d5caf /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

martinlsm · 2026-03-16T15:19:14Z

@pytorchbot label ciflow/trunk

martinlsm · 2026-03-16T15:19:23Z

@pytorchbot label "partner: arm"

martinlsm · 2026-03-16T15:19:32Z

@pytorchbot label "release notes: arm"

Copilot

Pull request overview

This PR reintroduces Arm backend model evaluation as a dedicated CLI (evaluate_model.py), replacing the previously embedded evaluation flow from aot_arm_compiler.py, and adds tests to exercise common invocation modes.

Changes:

Add backends/arm/scripts/evaluate_model.py to compile + (optionally) quantize and/or delegate a model, then evaluate it via Arm evaluator utilities.
Add pytest coverage for running evaluate_model.py against TOSA INT/FP targets and validating the emitted metrics JSON.
Update examples/arm/aot_arm_compiler.py messaging to point users to the new evaluation script.