Skip to content

optimize optimizer learning rate #8873

@jacquesqiao

Description

@jacquesqiao

Background

Profile script: dzhwinter/benchmark#84

From issue #8818 we can see that in parameter optimization stage, there are many elementwise_mul ops, they take a lot of time.
image

These elementwise_mul ops are used to compute learning_rate for each parameter because every parameter may have a different learning_rate, the computation process is

param_lr = global_lr * lr_for_param

global_lr is a global Variable, lr_for_param is a float value for a parameter, the default value is 1.0. The code above adds the elementwise_mul ops to the main program.

The improvement

Most of the time, the value of lr_for_param is 1.0, in this condition we have no need to add these elementwise_mul ops.

The logic after optimization should be:

if lr_for_param == 1.0:
    param_lr = global_lr
else:
    param_lr = global_lr * lr_for_param

A complete solution should be constant folding, we should add a constant folding transpiler which will recognize all constant value and calculate them during compile stage, this will reduce many ops running when executing the program.

Optimization result

Timeline after optimize
image

calc_step_num ave_step_time(before) ave_step_time(after) after/before
3 1.12088267008 1.03341897329 0.9219689097488165
38 1.05036788238 0.987895676964 0.9405234999432334
78 1.06520705345 0.953312274737 0.894954902569792

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions