Skip to content

The profiling results for ResNet. #6179

@qingqing01

Description

@qingqing01

Config and Env

The profiling results

Total examples: 2560, total time: 47.83653 sec
53.51559 examples/sec, 1.49489 sec/batch

unit: ms
Stat=ExecutorRunTimer       total=37099.9    avg=227.606    max=526.269    min=0.092      count=163

Stat=conv2d_grad             total=18104.7    avg=4.269      max=15.852     min=2.222      count=4240
Stat=conv2d                  total=9482.79    avg=2.236      max=10.438     min=0.601      count=4240
    ~~Stat=Im2ColTimer         total=461.405    avg=0.007      max=2.732      min=0.004      count=58880~~
    ~~Stat=GemmTimer           total=2343.45    avg=0.017      max=5.965      min=0.007      count=135680~~
Stat=batch_norm_grad         total=2008.38    avg=0.473      max=5.74       min=0.086      count=4240
Stat=batch_norm              total=1679.04    avg=0.396      max=4.169      min=0.09       count=4240
Stat=sum                     total=1347.13    avg=0.935      max=3.724      min=0.013      count=1440
Stat=relu_grad               total=988.555    avg=0.252      max=5.071      min=0.044      count=3920
Stat=elementwise_add_grad    total=706.461    avg=0.519      max=6.632      min=0.044      count=1360
Stat=relu                    total=694.561    avg=0.177      max=2.699      min=0.031      count=3920
Stat=elementwise_add         total=560.761    avg=0.412      max=3.953      min=0.02       count=1360
Stat=momentum                total=516.091    avg=0.04       max=9.432      min=0.02       count=12880
Stat=pool2d_grad             total=326.743    avg=2.042      max=6.607      min=0.241      count=160
Stat=CreateOpTimer           total=237.364    avg=0.005      max=7.503      min=0.001      count=44433
Stat=DeleteLocalScopeTimer   total=108.412    avg=0.665      max=2.648      min=0.001      count=163
Stat=CreateLocalScopeTimer   total=76.612     avg=0.47       max=1.804      min=0.004      count=163
Stat=pool2d                  total=73.987     avg=0.462      max=0.675      min=0.315      count=160
Stat=fill_constant           total=19.168     avg=0.03       max=12.984     min=0.006      count=619
Stat=cast                    total=12.543     avg=0.039      max=0.219      min=0.016      count=320
Stat=gaussian_random         total=11.399     avg=0.215      max=1.26       min=0.015      count=53
Stat=mul                     total=9.389      avg=0.117      max=0.212      min=0.092      count=80
Stat=accuracy                total=8.361      avg=0.104      max=0.477      min=0.068      count=80
Stat=mul_grad                total=7.426      avg=0.092      max=0.179      min=0.075      count=80
Stat=feed                    total=6.262      avg=0.039      max=0.185      min=0.015      count=160
Stat=softmax                 total=5.667      avg=0.07       max=0.677      min=0.042      count=80
Stat=fetch                   total=5.084      avg=0.021      max=0.254      min=0.01       count=240
Stat=top_k                   total=3.373      avg=0.042      max=0.126      min=0.026      count=80
Stat=softmax_grad            total=3.074      avg=0.038      max=0.1        min=0.029      count=80
Stat=cross_entropy_grad      total=2.604      avg=0.032      max=0.091      min=0.023      count=80
Stat=cross_entropy           total=2.429      avg=0.03       max=0.075      min=0.021      count=80
Stat=elementwise_div         total=2.412      avg=0.03       max=0.096      min=0.02       count=80
Stat=mean                    total=1.997      avg=0.024      max=0.077      min=0.018      count=80
Stat=mean_grad               total=1.838      avg=0.022      max=0.058      min=0.016      count=80
Stat=uniform_random          total=0.044      avg=0.044      max=0.044      min=0.044      count=1
--------------------------------------------------

The operators needing to optimize

  • Conv2d/Conv2d_grad
    • The total time of conv2d is 9482.79ms, But the mainly computing time of im2col and gemm is 461.405 + 2343.45 =2804.855ms. (There is no stream synchronization between im2col and gemm, so the time for im2col and gemm is not accurate.)
  • relu/relu_grad
  • elementwise_add/elementwise_add_grad
  • momentum
  • sum

The time of Python accounts about 22% of total time.

(47.83653-37.0999)/47.83653 = 22.44 %

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions