Skip to content

The need of cpu acceleartion implementation of depthwise convolution in Mobilenet. #2826

@NHZlX

Description

@NHZlX

Depthwise Convolution

background

  1. Now, people widely use the mobilenet for it's small model size(~12M for 1.0 mobilenet) and good preformance on many tasks(classification, detection etc), and just like it‘s name, it‘s widely used in Embedded system.

  2. PaddlePaddle is working on supporting the embedded system,therefore, mobilenet on paddle is indispensable

  3. Mobilenet mainly contains two operations: depthwise convolution and pointwise convolution. Pointwise convolution, that is, 1*1 convolution with groups equals 1, depthwise convolution is a specific convolution with groups equals the input channels. The optimization of mobilenet is basically the optimization of depthwise conv.

  4. Although one can build depthwise convolution with ExpandConvLayer in paddle, but it's very slow, especially training process. The Gpu acceleration of mobilenet on paddle have been already realized, this will speed up the mobilenet training process Mobilenet gpu implementation #2776

Need to do

  1. im2col operation are not necessary in 1 * 1 convolution.
  2. The need of cpu acceleration implementation of depthwise convolution. For ARM, neon acceleartion is also needed.
  3. Fuse batch normalization on paddle.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions