Skip to content

Add offline perf ci #181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 13, 2024
Merged

Add offline perf ci #181

merged 4 commits into from
Sep 13, 2024

Conversation

qihqi
Copy link
Collaborator

@qihqi qihqi commented Sep 6, 2024

  • add a benchmark_offline mode for cli
  • add ability to use random weights

@qihqi qihqi force-pushed the hanq_ci branch 10 times, most recently from 940e6ee to dfd99a9 Compare September 7, 2024 22:36
Copy link

github-actions bot commented Sep 9, 2024

Number of devices: 8
bfloat16 Matmul replicated: 369.596 ms sizes: ('2048.0 MiB', '2048.0 MiB')
bfloat16 Matmul sharded colrow: 108.966 ms sizes: ('2048.0 MiB', '2048.0 MiB')
bfloat16 matmul sharded rowcol: 76.5914 ms sizes: ('2048.0 MiB', '2048.0 MiB')
bfloat16 all_gather: 68.3534 ms sizes: ('2048.0 MiB',)
bfloat16 all_reduce: 8.24284 ms sizes: ('2048.0 MiB',)
bfloat16 Llama 3xffn shardmap: 1.80134 ms sizes: ('8.0 MiB', '86.0 MiB', '86.0 MiB', '86.0 MiB')
bfloat16 Llama 3xffn gspmd: 1.74834 ms sizes: ('8.0 MiB', '86.0 MiB', '86.0 MiB', '86.0 MiB')
int8 Matmul replicated: 186.856 ms sizes: ('1024.0 MiB', '1024.0 MiB')
int8 Matmul sharded colrow: 55.1954 ms sizes: ('1024.0 MiB', '1024.0 MiB')
int8 matmul sharded rowcol: 38.6339 ms sizes: ('1024.0 MiB', '1024.0 MiB')
int8 all_gather: 34.4436 ms sizes: ('1024.0 MiB',)
int8 all_reduce: 4.38876 ms sizes: ('1024.0 MiB',)
int8 Llama 3xffn shardmap: 1.76262 ms sizes: ('4.0 MiB', '43.0 MiB', '43.0 MiB', '43.0 MiB')
int8 Llama 3xffn gspmd: 1.72122 ms sizes: ('4.0 MiB', '43.0 MiB', '43.0 MiB', '43.0 MiB')

@FanhaiLu1
Copy link
Collaborator

Is this PR ready to review?

@qihqi qihqi marked this pull request as draft September 9, 2024 23:17
@qihqi qihqi force-pushed the hanq_ci branch 2 times, most recently from 60919fd to 5b370db Compare September 11, 2024 00:11
Copy link

Offline benchmark numbers

Model: meta-llama/Meta-Llama-3-8B-Instruct

Batch size: 128

Quantize: False

time (ms)
Prefill 16 7.418541400693357
Prefill 32 6.320667406544089
Prefill 64 6.286943610757589
Prefill 128 6.39411760494113
Prefill 256 6.337903602980077
Prefill 512 6.452551390975714
Prefill 1024 6.41555959591642
Decode 14.40514026035089

@qihqi qihqi marked this pull request as ready for review September 11, 2024 00:45
@qihqi
Copy link
Collaborator Author

qihqi commented Sep 11, 2024

Is this PR ready to review?

@FanhaiLu1 Now it's ready. The goal is for it to automatically generate the offline benchmark

Copy link

Offline benchmark numbers

Model: meta-llama/Meta-Llama-3-8B-Instruct

Batch size: 128

Quantize: False

time (ms)
Prefill 16 6.451009202282876
Prefill 32 6.156347203068435
Prefill 64 6.180983397644013
Prefill 128 6.245681399013847
Prefill 256 6.1956713907420635
Prefill 512 6.254743400495499
Prefill 1024 6.260169204324484
Decode 14.345658375532366

@@ -92,7 +92,8 @@ def main(argv):

decode_state = engine.init_decode_state()
profiler_started = False
for batch, _ in MAXTEXT_PREFILL.items():
for exp in range(4, 11):
batch = 2**exp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The run_prefill_time function argument is seqlen, should we rename batch to seqlen?

Copy link

Offline benchmark numbers

Model: meta-llama/Meta-Llama-3-8B-Instruct

Batch size: 128

Quantize: False

time (ms)
Prefill 16 6.463803607039154
Prefill 32 6.096175592392683
Prefill 64 6.448723399080336
Prefill 128 6.207663589157164
Prefill 256 6.185575597919524
Prefill 512 6.188617600128055
Prefill 1024 6.50523139629513
Decode 14.512341513182037

@qihqi qihqi merged commit 5b8823e into main Sep 13, 2024
5 checks passed
@qihqi qihqi deleted the hanq_ci branch September 13, 2024 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants