Skip to content

1.31x batch prefill, 1.24x batch decode speedup: NUMA binding #569

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 16, 2025

Conversation

copybara-service[bot]
Copy link

@copybara-service copybara-service bot commented May 14, 2025

1.31x batch prefill, 1.24x batch decode speedup: NUMA binding

Only the weights; binding MatMul output worsens batch=1 prefill.
Update gemma_batch_bench to use --decode_qbatch.
Fix/remove prefill_activations in gemma-inl.h.

Refactor:
use BasePageBytes directly when binding
Move BindB/C to .cc by de-templatizing
Remove MatOwners::AllocateFor because it is weights-specific (binding or not)
Disband MatOwners, replace with vector

@copybara-service copybara-service bot force-pushed the test_758598968 branch 4 times, most recently from fdb5632 to 3c658c0 Compare May 16, 2025 14:30
Only the weights; binding MatMul output worsens batch=1 prefill.
Update gemma_batch_bench to use --decode_qbatch.
Fix/remove prefill_activations in gemma-inl.h.

Refactor:
use BasePageBytes directly when binding
Move BindB/C to .cc by de-templatizing
Remove MatOwners::AllocateFor because it is weights-specific (binding or not)
Disband MatOwners, replace with vector
PiperOrigin-RevId: 759610477
@copybara-service copybara-service bot merged commit e890d46 into dev May 16, 2025
3 of 5 checks passed
@copybara-service copybara-service bot deleted the test_758598968 branch May 16, 2025 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants