Skip to content

Conversation

@allnes
Copy link
Collaborator

@allnes allnes commented Dec 16, 2025

AArch64 jit_uni_reorder now treats pure f16→f16 as valid (previously only f32<->f16 passed), preventing unnecessary fallback to reference. For f16 cases, the small‑stride requirement is relaxed so blocked/large‑stride layouts can stay on the JIT path instead of degrading to ref. This should reduce ref reorder usage and keep f16 workloads on optimized kernels on AArch64.

…hs. AArch64 jit_uni_reorder now treats pure f16→f16 as valid (previously only f32<->f16 passed), preventing unnecessary fallback to reference. For f16 cases, the small‑stride requirement is relaxed so blocked/large‑stride layouts can stay on the JIT path instead of degrading to ref. This should reduce ref reorder usage and keep f16 workloads on optimized kernels on AArch64.
@allnes allnes changed the title Allow f16→f16 AArch64 JIT reorder and relax stride checks for f16 paths [cpu][arm] Allow f16→f16 AArch64 JIT reorder and relax stride checks for f16 paths Dec 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant