1.31x batch prefill, 1.24x batch decode speedup: NUMA binding #569

copybara-service · 2025-05-14T10:21:11Z

1.31x batch prefill, 1.24x batch decode speedup: NUMA binding

Only the weights; binding MatMul output worsens batch=1 prefill.
Update gemma_batch_bench to use --decode_qbatch.
Fix/remove prefill_activations in gemma-inl.h.

Refactor:
use BasePageBytes directly when binding
Move BindB/C to .cc by de-templatizing
Remove MatOwners::AllocateFor because it is weights-specific (binding or not)
Disband MatOwners, replace with vector

Only the weights; binding MatMul output worsens batch=1 prefill. Update gemma_batch_bench to use --decode_qbatch. Fix/remove prefill_activations in gemma-inl.h. Refactor: use BasePageBytes directly when binding Move BindB/C to .cc by de-templatizing Remove MatOwners::AllocateFor because it is weights-specific (binding or not) Disband MatOwners, replace with vector PiperOrigin-RevId: 759610477

copybara-service bot force-pushed the test_758598968 branch 4 times, most recently from fdb5632 to 3c658c0 Compare May 16, 2025 14:30

copybara-service bot force-pushed the test_758598968 branch from 3c658c0 to e890d46 Compare May 16, 2025 14:42

copybara-service bot merged commit e890d46 into dev May 16, 2025
3 of 5 checks passed

copybara-service bot deleted the test_758598968 branch May 16, 2025 14:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.31x batch prefill, 1.24x batch decode speedup: NUMA binding #569

1.31x batch prefill, 1.24x batch decode speedup: NUMA binding #569

copybara-service bot commented May 14, 2025 •

edited

Loading

1.31x batch prefill, 1.24x batch decode speedup: NUMA binding #569

1.31x batch prefill, 1.24x batch decode speedup: NUMA binding #569

Conversation

copybara-service bot commented May 14, 2025 • edited Loading

copybara-service bot commented May 14, 2025 •

edited

Loading