Skip to content

Add benchmarks, start torschript #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 14, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions benchmark_outputs/batch_size_1_num_workers_0.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
Starting up...
Building data loaders...
Initializing Model...
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 12.05 seconds
Running inference benchmark...

Working on device: cuda
Starting BATCH 1 of 5
Finished Batch 1 of 5
Batch load time: 0.0010531009902479127
Batch inference time: 5.044861934002256
Batch total time: 5.045921023993287
Starting BATCH 2 of 5
Finished Batch 2 of 5
Batch load time: 0.000634292999166064
Batch inference time: 4.504916821009829
Batch total time: 4.505557062002481
Starting BATCH 3 of 5
Finished Batch 3 of 5
Batch load time: 0.0007395810098387301
Batch inference time: 4.533521624005516
Batch total time: 4.534268278002855
Starting BATCH 4 of 5
Finished Batch 4 of 5
Batch load time: 0.0006648830021731555
Batch inference time: 4.495368515010341
Batch total time: 4.496039069999824
Starting BATCH 5 of 5
Finished Batch 5 of 5
Batch load time: 0.0006519919988932088
Batch inference time: 4.496985145000508
Batch total time: 4.497643199007143


Manual Profile Results...
Data-loading times
> per epoch: tensor([0.0011, 0.0006, 0.0007, 0.0007, 0.0007])
> average: tensor(0.0007)

Inference time for each epoch
> per epoch tensor([5.0430, 4.5039, 4.5352, 4.4961, 4.4961])
> average tensor(4.6133)

Total time for each epoch
> per epoch tensor([5.0469, 4.5039, 4.5352, 4.4961, 4.4961])
> average tensor(4.6172)



Profiling sorted by CUDA time total
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem # of Calls
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
run_benchmark 27.07% 6.249s 100.00% 23.083s 23.083s 0.000us 0.00% 15.070s 15.070s 0 b -2.66 Kb 8.13 Mb -4.18 Gb 1
"benchmark_outputs/batch_size_1_num_workers_0.txt" 91L, 8729B 53,1 Top
58 changes: 58 additions & 0 deletions benchmark_outputs/batch_size_1_num_workers_1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
Starting up...
Building data loaders...
Initializing Model...
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 65.44 seconds
Running inference benchmark...

Working on device: cuda
Starting BATCH 1 of 5
Finished Batch 1 of 5
Batch load time: 0.17542113500530832
Batch inference time: 5.1042240419919835
Batch total time: 5.27965795599448
Starting BATCH 2 of 5
Finished Batch 2 of 5
Batch load time: 0.11129688000073656
Batch inference time: 4.523343688008026
Batch total time: 4.634657599002821
Starting BATCH 3 of 5
Finished Batch 3 of 5
Batch load time: 0.0730158430087613
Batch inference time: 4.516634412007988
Batch total time: 4.589664141007233
Starting BATCH 4 of 5
Finished Batch 4 of 5
Batch load time: 0.07771697499265429
Batch inference time: 4.432252533995779
Batch total time: 4.509983689000364
Starting BATCH 5 of 5
Finished Batch 5 of 5
Batch load time: 0.0820326890097931
Batch inference time: 4.4701670890062815
Batch total time: 4.552215193005395


Manual Profile Results...
Data-loading times
> per epoch: tensor([0.1754, 0.1113, 0.0730, 0.0777, 0.0820])
> average: tensor(0.1039)

Inference time for each epoch
> per epoch tensor([5.1055, 4.5234, 4.5156, 4.4336, 4.4688])
> average tensor(4.6094)

Total time for each epoch
> per epoch tensor([5.2812, 4.6328, 4.5898, 4.5117, 4.5508])
> average tensor(4.7148)



Profiling sorted by CUDA time total
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem # of Calls
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
run_benchmark 27.94% 6.585s 100.00% 23.570s 23.570s 0.000us 0.00% 15.065s 15.065s 0 b -2.66 Kb 8.13 Mb -4.16 Gb 1
"benchmark_outputs/batch_size_1_num_workers_1.txt" 91L, 8719B 1,1 Top
58 changes: 58 additions & 0 deletions benchmark_outputs/batch_size_2_num_workers_0.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
Starting up...
Building data loaders...
Initializing Model...
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 67.71 seconds
Running inference benchmark...

Working on device: cuda
Starting BATCH 1 of 5
Finished Batch 1 of 5
Batch load time: 0.0012337989901425317
Batch inference time: 4.979560280000442
Batch total time: 4.980800386008923
Starting BATCH 2 of 5
Finished Batch 2 of 5
Batch load time: 0.0006314070051303133
Batch inference time: 4.436402372986777
Batch total time: 4.4370395870064385
Starting BATCH 3 of 5
Finished Batch 3 of 5
Batch load time: 0.0006100949976826087
Batch inference time: 4.489002016998711
Batch total time: 4.489618256004178
Starting BATCH 4 of 5
Finished Batch 4 of 5
Batch load time: 0.000657053999020718
Batch inference time: 4.481591136995121
Batch total time: 4.482254397997167
Starting BATCH 5 of 5
Finished Batch 5 of 5
Batch load time: 0.0006241549999685958
Batch inference time: 4.433049211991602
Batch total time: 4.43367860399303


Manual Profile Results...
Data-loading times
> per epoch: tensor([0.0012, 0.0006, 0.0006, 0.0007, 0.0006])
> average: tensor(0.0008)

Inference time for each epoch
> per epoch tensor([4.9805, 4.4375, 4.4883, 4.4805, 4.4336])
> average tensor(4.5625)

Total time for each epoch
> per epoch tensor([4.9805, 4.4375, 4.4883, 4.4805, 4.4336])
> average tensor(4.5625)



Profiling sorted by CUDA time total
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem # of Calls
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
run_benchmark 26.66% 6.085s 100.00% 22.827s 22.827s 0.000us 0.00% 15.085s 15.085s 0 b -2.66 Kb 8.13 Mb -4.18 Gb 1
"benchmark_outputs/batch_size_2_num_workers_0.txt" 91L, 8729B 52,0-1 Top
58 changes: 58 additions & 0 deletions benchmark_outputs/batch_size_2_num_workers_1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
Starting up...
Building data loaders...
Initializing Model...
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 111.55 seconds
Running inference benchmark...

Working on device: cuda
Starting BATCH 1 of 5
Finished Batch 1 of 5
Batch load time: 0.0553153660002863
Batch inference time: 4.988452916993992
Batch total time: 5.043779415995232
Starting BATCH 2 of 5
Finished Batch 2 of 5
Batch load time: 0.06645661000220571
Batch inference time: 4.431401344001642
Batch total time: 4.49787032698805
Starting BATCH 3 of 5
Finished Batch 3 of 5
Batch load time: 0.06894606399873737
Batch inference time: 4.5093786460056435
Batch total time: 4.5783342299982905
Starting BATCH 4 of 5
Finished Batch 4 of 5
Batch load time: 0.10248679800133687
Batch inference time: 4.488342932003434
Batch total time: 4.590840438992018
Starting BATCH 5 of 5
Finished Batch 5 of 5
Batch load time: 0.07949267400545068
Batch inference time: 4.5397761540079955
Batch total time: 4.619280054001138


Manual Profile Results...
Data-loading times
> per epoch: tensor([0.0553, 0.0665, 0.0690, 0.1025, 0.0795])
> average: tensor(0.0745)

Inference time for each epoch
> per epoch tensor([4.9883, 4.4297, 4.5078, 4.4883, 4.5391])
> average tensor(4.5898)

Total time for each epoch
> per epoch tensor([5.0430, 4.4961, 4.5781, 4.5898, 4.6211])
> average tensor(4.6641)



Profiling sorted by CUDA time total
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem # of Calls
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
run_benchmark 27.47% 6.410s 100.00% 23.334s 23.334s 0.000us 0.00% 15.079s 15.079s 0 b -2.66 Kb 8.13 Mb -4.18 Gb 1
"benchmark_outputs/batch_size_2_num_workers_1.txt" 91L, 8722B 1,1 Top
58 changes: 58 additions & 0 deletions benchmark_outputs/batch_size_4_num_workers_0.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
Starting up...
Building data loaders...
Initializing Model...
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 182.83 seconds
Running inference benchmark...

Working on device: cuda
Starting BATCH 1 of 5
Finished Batch 1 of 5
Batch load time: 0.0010882980132009834
Batch inference time: 5.046935585996835
Batch total time: 5.048030566002126
Starting BATCH 2 of 5
Finished Batch 2 of 5
Batch load time: 0.0006488210055977106
Batch inference time: 4.47928033999051
Batch total time: 4.479962162004085
Starting BATCH 3 of 5
Finished Batch 3 of 5
Batch load time: 0.0006499989976873621
Batch inference time: 4.4514494110044325
Batch total time: 4.452105590986321
Starting BATCH 4 of 5
Finished Batch 4 of 5
Batch load time: 0.0006679049984086305
Batch inference time: 4.445740676994319
Batch total time: 4.446414493009797
Starting BATCH 5 of 5
Finished Batch 5 of 5
Batch load time: 0.0006229520076885819
Batch inference time: 4.457714663003571
Batch total time: 4.458343616002821


Manual Profile Results...
Data-loading times
> per epoch: tensor([0.0011, 0.0006, 0.0006, 0.0007, 0.0006])
> average: tensor(0.0007)

Inference time for each epoch
> per epoch tensor([5.0469, 4.4805, 4.4531, 4.4453, 4.4570])
> average tensor(4.5781)

Total time for each epoch
> per epoch tensor([5.0469, 4.4805, 4.4531, 4.4453, 4.4570])
> average tensor(4.5781)



Profiling sorted by CUDA time total
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem # of Calls
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
run_benchmark 26.98% 6.176s 100.00% 22.888s 22.888s 0.000us 0.00% 15.065s 15.065s 0 b -2.66 Kb 8.13 Mb -4.19 Gb 1
"benchmark_outputs/batch_size_4_num_workers_0.txt" 91L, 8731B 1,1 Top
58 changes: 58 additions & 0 deletions benchmark_outputs/batch_size_4_num_workers_1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
Starting up...
Building data loaders...
Initializing Model...
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 100.42 seconds
Running inference benchmark...

Working on device: cuda
Starting BATCH 1 of 5
Finished Batch 1 of 5
Batch load time: 0.1019776799948886
Batch inference time: 5.030931850997149
Batch total time: 5.132919580995804
Starting BATCH 2 of 5
Finished Batch 2 of 5
Batch load time: 0.0671316090010805
Batch inference time: 4.438084541005082
Batch total time: 4.505228757989244
Starting BATCH 3 of 5
Finished Batch 3 of 5
Batch load time: 0.06837264500791207
Batch inference time: 4.474854444008088
Batch total time: 4.543237587000476
Starting BATCH 4 of 5
Finished Batch 4 of 5
Batch load time: 0.07436333999794442
Batch inference time: 4.4623387989995535
Batch total time: 4.53671289801423
Starting BATCH 5 of 5
Finished Batch 5 of 5
Batch load time: 0.07757725499686785
Batch inference time: 4.4232901810028125
Batch total time: 4.500878610997461


Manual Profile Results...
Data-loading times
> per epoch: tensor([0.1020, 0.0671, 0.0684, 0.0743, 0.0776])
> average: tensor(0.0779)

Inference time for each epoch
> per epoch tensor([5.0312, 4.4375, 4.4766, 4.4609, 4.4219])
> average tensor(4.5664)

Total time for each epoch
> per epoch tensor([5.1328, 4.5039, 4.5430, 4.5352, 4.5000])
> average tensor(4.6445)



Profiling sorted by CUDA time total
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem # of Calls
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
run_benchmark 27.55% 6.399s 100.00% 23.222s 23.222s 0.000us 0.00% 15.063s 15.063s 0 b -3.18 Kb 8.13 Mb -4.17 Gb 1
"benchmark_outputs/batch_size_4_num_workers_1.txt" 91L, 8720B 1,1 Top
Loading