Add offline perf ci #181

qihqi · 2024-09-06T22:08:58Z

add a benchmark_offline mode for cli
add ability to use random weights

github-actions · 2024-09-09T17:17:28Z

Number of devices: 8
bfloat16 Matmul replicated: 369.596 ms sizes: ('2048.0 MiB', '2048.0 MiB')
bfloat16 Matmul sharded colrow: 108.966 ms sizes: ('2048.0 MiB', '2048.0 MiB')
bfloat16 matmul sharded rowcol: 76.5914 ms sizes: ('2048.0 MiB', '2048.0 MiB')
bfloat16 all_gather: 68.3534 ms sizes: ('2048.0 MiB',)
bfloat16 all_reduce: 8.24284 ms sizes: ('2048.0 MiB',)
bfloat16 Llama 3xffn shardmap: 1.80134 ms sizes: ('8.0 MiB', '86.0 MiB', '86.0 MiB', '86.0 MiB')
bfloat16 Llama 3xffn gspmd: 1.74834 ms sizes: ('8.0 MiB', '86.0 MiB', '86.0 MiB', '86.0 MiB')
int8 Matmul replicated: 186.856 ms sizes: ('1024.0 MiB', '1024.0 MiB')
int8 Matmul sharded colrow: 55.1954 ms sizes: ('1024.0 MiB', '1024.0 MiB')
int8 matmul sharded rowcol: 38.6339 ms sizes: ('1024.0 MiB', '1024.0 MiB')
int8 all_gather: 34.4436 ms sizes: ('1024.0 MiB',)
int8 all_reduce: 4.38876 ms sizes: ('1024.0 MiB',)
int8 Llama 3xffn shardmap: 1.76262 ms sizes: ('4.0 MiB', '43.0 MiB', '43.0 MiB', '43.0 MiB')
int8 Llama 3xffn gspmd: 1.72122 ms sizes: ('4.0 MiB', '43.0 MiB', '43.0 MiB', '43.0 MiB')

FanhaiLu1 · 2024-09-09T21:09:10Z

Is this PR ready to review?

github-actions · 2024-09-11T00:42:15Z

Offline benchmark numbers

Model: meta-llama/Meta-Llama-3-8B-Instruct

Batch size: 128

Quantize: False

	time (ms)
Prefill 16	7.418541400693357
Prefill 32	6.320667406544089
Prefill 64	6.286943610757589
Prefill 128	6.39411760494113
Prefill 256	6.337903602980077
Prefill 512	6.452551390975714
Prefill 1024	6.41555959591642
Decode	14.40514026035089

qihqi · 2024-09-11T00:46:14Z

Is this PR ready to review?

@FanhaiLu1 Now it's ready. The goal is for it to automatically generate the offline benchmark

github-actions · 2024-09-11T00:48:43Z

Offline benchmark numbers

Model: meta-llama/Meta-Llama-3-8B-Instruct

Batch size: 128

Quantize: False

	time (ms)
Prefill 16	6.451009202282876
Prefill 32	6.156347203068435
Prefill 64	6.180983397644013
Prefill 128	6.245681399013847
Prefill 256	6.1956713907420635
Prefill 512	6.254743400495499
Prefill 1024	6.260169204324484
Decode	14.345658375532366

FanhaiLu1 · 2024-09-11T17:54:51Z

benchmarks/run_offline.py

@@ -92,7 +92,8 @@ def main(argv):

  decode_state = engine.init_decode_state()
  profiler_started = False
-  for batch, _ in MAXTEXT_PREFILL.items():
+  for exp in range(4, 11):
+    batch = 2**exp


The run_prefill_time function argument is seqlen, should we rename batch to seqlen?

github-actions · 2024-09-12T23:28:38Z

Offline benchmark numbers

Model: meta-llama/Meta-Llama-3-8B-Instruct

Batch size: 128

Quantize: False

	time (ms)
Prefill 16	6.463803607039154
Prefill 32	6.096175592392683
Prefill 64	6.448723399080336
Prefill 128	6.207663589157164
Prefill 256	6.185575597919524
Prefill 512	6.188617600128055
Prefill 1024	6.50523139629513
Decode	14.512341513182037

qihqi force-pushed the hanq_ci branch 10 times, most recently from 940e6ee to dfd99a9 Compare September 7, 2024 22:36

qihqi marked this pull request as draft September 9, 2024 23:17

Add offline perf ci

ac30781

qihqi force-pushed the hanq_ci branch 2 times, most recently from 60919fd to 5b370db Compare September 11, 2024 00:11

Add run_offline; also random weights

674ea81

qihqi force-pushed the hanq_ci branch from 5b370db to 674ea81 Compare September 11, 2024 00:21

lints

9836998

qihqi force-pushed the hanq_ci branch from 3e7494d to 9836998 Compare September 11, 2024 00:41

qihqi marked this pull request as ready for review September 11, 2024 00:45

FanhaiLu1 approved these changes Sep 11, 2024

View reviewed changes

rename variable

8e5be21

qihqi merged commit 5b8823e into main Sep 13, 2024
5 checks passed

qihqi deleted the hanq_ci branch September 13, 2024 00:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add offline perf ci #181

Add offline perf ci #181

Uh oh!

qihqi commented Sep 6, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Sep 9, 2024

Uh oh!

FanhaiLu1 commented Sep 9, 2024

Uh oh!

github-actions bot commented Sep 11, 2024

Uh oh!

qihqi commented Sep 11, 2024

Uh oh!

github-actions bot commented Sep 11, 2024

Uh oh!

FanhaiLu1 Sep 11, 2024

Uh oh!

github-actions bot commented Sep 12, 2024

Uh oh!

Uh oh!

Uh oh!

Add offline perf ci #181

Add offline perf ci #181

Uh oh!

Conversation

qihqi commented Sep 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 9, 2024

Uh oh!

FanhaiLu1 commented Sep 9, 2024

Uh oh!

github-actions bot commented Sep 11, 2024

Offline benchmark numbers

Model: meta-llama/Meta-Llama-3-8B-Instruct

Batch size: 128

Quantize: False

Uh oh!

qihqi commented Sep 11, 2024

Uh oh!

github-actions bot commented Sep 11, 2024

Offline benchmark numbers

Model: meta-llama/Meta-Llama-3-8B-Instruct

Batch size: 128

Quantize: False

Uh oh!

FanhaiLu1 Sep 11, 2024

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 12, 2024

Offline benchmark numbers

Model: meta-llama/Meta-Llama-3-8B-Instruct

Batch size: 128

Quantize: False

Uh oh!

Uh oh!

Uh oh!

qihqi commented Sep 6, 2024 •

edited

Loading