gtamer2 · gtamer2 · Dec 14, 2023 · Dec 14, 2023 · Dec 14, 2023 · Dec 14, 2023
diff --git a/benchmark_outputs/batch_size_1_num_workers_0.txt b/benchmark_outputs/batch_size_1_num_workers_0.txt
@@ -0,0 +1,58 @@
+Starting up...
+Building data loaders...
+Initializing Model...
+> initializing model parallel with size 1
+> initializing ddp with size 1
+> initializing pipeline with size 1
+Loaded in 12.05 seconds
+Running inference benchmark...
+
+Working on device: cuda
+Starting BATCH 1 of 5
+Finished Batch 1 of 5
+Batch load time: 0.0010531009902479127
+Batch inference time: 5.044861934002256
+Batch total time: 5.045921023993287
+Starting BATCH 2 of 5
+Finished Batch 2 of 5
+Batch load time: 0.000634292999166064
+Batch inference time: 4.504916821009829
+Batch total time: 4.505557062002481
+Starting BATCH 3 of 5
+Finished Batch 3 of 5
+Batch load time: 0.0007395810098387301
+Batch inference time: 4.533521624005516
+Batch total time: 4.534268278002855
+Starting BATCH 4 of 5
+Finished Batch 4 of 5
+Batch load time: 0.0006648830021731555
+Batch inference time: 4.495368515010341
+Batch total time: 4.496039069999824
+Starting BATCH 5 of 5
+Finished Batch 5 of 5
+Batch load time: 0.0006519919988932088
+Batch inference time: 4.496985145000508
+Batch total time: 4.497643199007143
+
+
+ Manual Profile Results...
+Data-loading times
+> per epoch:  tensor([0.0011, 0.0006, 0.0007, 0.0007, 0.0007])
+> average:  tensor(0.0007)
+
+Inference time for each epoch
+> per epoch tensor([5.0430, 4.5039, 4.5352, 4.4961, 4.4961])
+> average tensor(4.6133)
+
+Total time for each epoch
+> per epoch tensor([5.0469, 4.5039, 4.5352, 4.4961, 4.4961])
+> average tensor(4.6172)
+
+
+
+Profiling sorted by CUDA time total
+-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
+                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg       CPU Mem  Self CPU Mem      CUDA Mem  Self CUDA Mem    # of Calls
+-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
+                                          run_benchmark        27.07%        6.249s       100.00%       23.083s       23.083s       0.000us         0.00%       15.070s       15.070s           0 b      -2.66 Kb       8.13 Mb      -4.18 Gb             1
+"benchmark_outputs/batch_size_1_num_workers_0.txt" 91L, 8729B                       53,1          Top
diff --git a/benchmark_outputs/batch_size_1_num_workers_1.txt b/benchmark_outputs/batch_size_1_num_workers_1.txt
@@ -0,0 +1,58 @@
+Starting up...
+Building data loaders...
+Initializing Model...
+> initializing model parallel with size 1
+> initializing ddp with size 1
+> initializing pipeline with size 1
+Loaded in 65.44 seconds
+Running inference benchmark...
+
+Working on device: cuda
+Starting BATCH 1 of 5
+Finished Batch 1 of 5
+Batch load time: 0.17542113500530832
+Batch inference time: 5.1042240419919835
+Batch total time: 5.27965795599448
+Starting BATCH 2 of 5
+Finished Batch 2 of 5
+Batch load time: 0.11129688000073656
+Batch inference time: 4.523343688008026
+Batch total time: 4.634657599002821
+Starting BATCH 3 of 5
+Finished Batch 3 of 5
+Batch load time: 0.0730158430087613
+Batch inference time: 4.516634412007988
+Batch total time: 4.589664141007233
+Starting BATCH 4 of 5
+Finished Batch 4 of 5
+Batch load time: 0.07771697499265429
+Batch inference time: 4.432252533995779
+Batch total time: 4.509983689000364
+Starting BATCH 5 of 5
+Finished Batch 5 of 5
+Batch load time: 0.0820326890097931
+Batch inference time: 4.4701670890062815
+Batch total time: 4.552215193005395
+
+
+ Manual Profile Results...
+Data-loading times
+> per epoch:  tensor([0.1754, 0.1113, 0.0730, 0.0777, 0.0820])
+> average:  tensor(0.1039)
+
+Inference time for each epoch
+> per epoch tensor([5.1055, 4.5234, 4.5156, 4.4336, 4.4688])
+> average tensor(4.6094)
+
+Total time for each epoch
+> per epoch tensor([5.2812, 4.6328, 4.5898, 4.5117, 4.5508])
+> average tensor(4.7148)
+
+
+
+Profiling sorted by CUDA time total
+-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
+                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg       CPU Mem  Self CPU Mem      CUDA Mem  Self CUDA Mem    # of Calls
+-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
+                                          run_benchmark        27.94%        6.585s       100.00%       23.570s       23.570s       0.000us         0.00%       15.065s       15.065s           0 b      -2.66 Kb       8.13 Mb      -4.16 Gb             1
+"benchmark_outputs/batch_size_1_num_workers_1.txt" 91L, 8719B                       1,1           Top
diff --git a/benchmark_outputs/batch_size_2_num_workers_0.txt b/benchmark_outputs/batch_size_2_num_workers_0.txt
@@ -0,0 +1,58 @@
+Starting up...
+Building data loaders...
+Initializing Model...
+> initializing model parallel with size 1
+> initializing ddp with size 1
+> initializing pipeline with size 1
+Loaded in 67.71 seconds
+Running inference benchmark...
+
+Working on device: cuda
+Starting BATCH 1 of 5
+Finished Batch 1 of 5
+Batch load time: 0.0012337989901425317
+Batch inference time: 4.979560280000442
+Batch total time: 4.980800386008923
+Starting BATCH 2 of 5
+Finished Batch 2 of 5
+Batch load time: 0.0006314070051303133
+Batch inference time: 4.436402372986777
+Batch total time: 4.4370395870064385
+Starting BATCH 3 of 5
+Finished Batch 3 of 5
+Batch load time: 0.0006100949976826087
+Batch inference time: 4.489002016998711
+Batch total time: 4.489618256004178
+Starting BATCH 4 of 5
+Finished Batch 4 of 5
+Batch load time: 0.000657053999020718
+Batch inference time: 4.481591136995121
+Batch total time: 4.482254397997167
+Starting BATCH 5 of 5
+Finished Batch 5 of 5
+Batch load time: 0.0006241549999685958
+Batch inference time: 4.433049211991602
+Batch total time: 4.43367860399303
+
+
+ Manual Profile Results...
+Data-loading times
+> per epoch:  tensor([0.0012, 0.0006, 0.0006, 0.0007, 0.0006])
+> average:  tensor(0.0008)
+
+Inference time for each epoch
+> per epoch tensor([4.9805, 4.4375, 4.4883, 4.4805, 4.4336])
+> average tensor(4.5625)
+
+Total time for each epoch
+> per epoch tensor([4.9805, 4.4375, 4.4883, 4.4805, 4.4336])
+> average tensor(4.5625)
+
+
+
+Profiling sorted by CUDA time total
+-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
+                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg       CPU Mem  Self CPU Mem      CUDA Mem  Self CUDA Mem    # of Calls
+-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
+                                          run_benchmark        26.66%        6.085s       100.00%       22.827s       22.827s       0.000us         0.00%       15.085s       15.085s           0 b      -2.66 Kb       8.13 Mb      -4.18 Gb             1
+"benchmark_outputs/batch_size_2_num_workers_0.txt" 91L, 8729B                       52,0-1        Top
diff --git a/benchmark_outputs/batch_size_2_num_workers_1.txt b/benchmark_outputs/batch_size_2_num_workers_1.txt
@@ -0,0 +1,58 @@
+Starting up...
+Building data loaders...
+Initializing Model...
+> initializing model parallel with size 1
+> initializing ddp with size 1
+> initializing pipeline with size 1
+Loaded in 111.55 seconds
+Running inference benchmark...
+
+Working on device: cuda
+Starting BATCH 1 of 5
+Finished Batch 1 of 5
+Batch load time: 0.0553153660002863
+Batch inference time: 4.988452916993992
+Batch total time: 5.043779415995232
+Starting BATCH 2 of 5
+Finished Batch 2 of 5
+Batch load time: 0.06645661000220571
+Batch inference time: 4.431401344001642
+Batch total time: 4.49787032698805
+Starting BATCH 3 of 5
+Finished Batch 3 of 5
+Batch load time: 0.06894606399873737
+Batch inference time: 4.5093786460056435
+Batch total time: 4.5783342299982905
+Starting BATCH 4 of 5
+Finished Batch 4 of 5
+Batch load time: 0.10248679800133687
+Batch inference time: 4.488342932003434
+Batch total time: 4.590840438992018
+Starting BATCH 5 of 5
+Finished Batch 5 of 5
+Batch load time: 0.07949267400545068
+Batch inference time: 4.5397761540079955
+Batch total time: 4.619280054001138
+
+
+ Manual Profile Results...
+Data-loading times
+> per epoch:  tensor([0.0553, 0.0665, 0.0690, 0.1025, 0.0795])
+> average:  tensor(0.0745)
+
+Inference time for each epoch
+> per epoch tensor([4.9883, 4.4297, 4.5078, 4.4883, 4.5391])
+> average tensor(4.5898)
+
+Total time for each epoch
+> per epoch tensor([5.0430, 4.4961, 4.5781, 4.5898, 4.6211])
+> average tensor(4.6641)
+
+
+
+Profiling sorted by CUDA time total
+-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
+                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg       CPU Mem  Self CPU Mem      CUDA Mem  Self CUDA Mem    # of Calls
+-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
+                                          run_benchmark        27.47%        6.410s       100.00%       23.334s       23.334s       0.000us         0.00%       15.079s       15.079s           0 b      -2.66 Kb       8.13 Mb      -4.18 Gb             1
+"benchmark_outputs/batch_size_2_num_workers_1.txt" 91L, 8722B                       1,1           Top
diff --git a/benchmark_outputs/batch_size_4_num_workers_0.txt b/benchmark_outputs/batch_size_4_num_workers_0.txt
@@ -0,0 +1,58 @@
+Starting up...
+Building data loaders...
+Initializing Model...
+> initializing model parallel with size 1
+> initializing ddp with size 1
+> initializing pipeline with size 1
+Loaded in 182.83 seconds
+Running inference benchmark...
+
+Working on device: cuda
+Starting BATCH 1 of 5
+Finished Batch 1 of 5
+Batch load time: 0.0010882980132009834
+Batch inference time: 5.046935585996835
+Batch total time: 5.048030566002126
+Starting BATCH 2 of 5
+Finished Batch 2 of 5
+Batch load time: 0.0006488210055977106
+Batch inference time: 4.47928033999051
+Batch total time: 4.479962162004085
+Starting BATCH 3 of 5
+Finished Batch 3 of 5
+Batch load time: 0.0006499989976873621
+Batch inference time: 4.4514494110044325
+Batch total time: 4.452105590986321
+Starting BATCH 4 of 5
+Finished Batch 4 of 5
+Batch load time: 0.0006679049984086305
+Batch inference time: 4.445740676994319
+Batch total time: 4.446414493009797
+Starting BATCH 5 of 5
+Finished Batch 5 of 5
+Batch load time: 0.0006229520076885819
+Batch inference time: 4.457714663003571
+Batch total time: 4.458343616002821
+
+
+ Manual Profile Results...
+Data-loading times
+> per epoch:  tensor([0.0011, 0.0006, 0.0006, 0.0007, 0.0006])
+> average:  tensor(0.0007)
+
+Inference time for each epoch
+> per epoch tensor([5.0469, 4.4805, 4.4531, 4.4453, 4.4570])
+> average tensor(4.5781)
+
+Total time for each epoch
+> per epoch tensor([5.0469, 4.4805, 4.4531, 4.4453, 4.4570])
+> average tensor(4.5781)
+
+
+
+Profiling sorted by CUDA time total
+-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
+                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg       CPU Mem  Self CPU Mem      CUDA Mem  Self CUDA Mem    # of Calls
+-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
+                                          run_benchmark        26.98%        6.176s       100.00%       22.888s       22.888s       0.000us         0.00%       15.065s       15.065s           0 b      -2.66 Kb       8.13 Mb      -4.19 Gb             1
+"benchmark_outputs/batch_size_4_num_workers_0.txt" 91L, 8731B                       1,1           Top
diff --git a/benchmark_outputs/batch_size_4_num_workers_1.txt b/benchmark_outputs/batch_size_4_num_workers_1.txt
@@ -0,0 +1,58 @@
+Starting up...
+Building data loaders...
+Initializing Model...
+> initializing model parallel with size 1
+> initializing ddp with size 1
+> initializing pipeline with size 1
+Loaded in 100.42 seconds
+Running inference benchmark...
+
+Working on device: cuda
+Starting BATCH 1 of 5
+Finished Batch 1 of 5
+Batch load time: 0.1019776799948886
+Batch inference time: 5.030931850997149
+Batch total time: 5.132919580995804
+Starting BATCH 2 of 5
+Finished Batch 2 of 5
+Batch load time: 0.0671316090010805
+Batch inference time: 4.438084541005082
+Batch total time: 4.505228757989244
+Starting BATCH 3 of 5
+Finished Batch 3 of 5
+Batch load time: 0.06837264500791207
+Batch inference time: 4.474854444008088
+Batch total time: 4.543237587000476
+Starting BATCH 4 of 5
+Finished Batch 4 of 5
+Batch load time: 0.07436333999794442
+Batch inference time: 4.4623387989995535
+Batch total time: 4.53671289801423
+Starting BATCH 5 of 5
+Finished Batch 5 of 5
+Batch load time: 0.07757725499686785
+Batch inference time: 4.4232901810028125
+Batch total time: 4.500878610997461
+
+
+ Manual Profile Results...
+Data-loading times
+> per epoch:  tensor([0.1020, 0.0671, 0.0684, 0.0743, 0.0776])
+> average:  tensor(0.0779)
+
+Inference time for each epoch
+> per epoch tensor([5.0312, 4.4375, 4.4766, 4.4609, 4.4219])
+> average tensor(4.5664)
+
+Total time for each epoch
+> per epoch tensor([5.1328, 4.5039, 4.5430, 4.5352, 4.5000])
+> average tensor(4.6445)
+
+
+
+Profiling sorted by CUDA time total
+-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
+                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg       CPU Mem  Self CPU Mem      CUDA Mem  Self CUDA Mem    # of Calls
+-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
+                                          run_benchmark        27.55%        6.399s       100.00%       23.222s       23.222s       0.000us         0.00%       15.063s       15.063s           0 b      -3.18 Kb       8.13 Mb      -4.17 Gb             1
+"benchmark_outputs/batch_size_4_num_workers_1.txt" 91L, 8720B                       1,1           Top