The need of cpu acceleartion implementation of depthwise convolution in Mobilenet.

## Depthwise Convolution 
### background
1. Now, people widely use the mobilenet for it's small model size(～12M for 1.0 mobilenet) and good preformance on many tasks(classification, detection etc), and just like it‘s name， it‘s widely used in Embedded system.

2. PaddlePaddle is working on supporting the embedded system，therefore, mobilenet on paddle is indispensable

3. Mobilenet mainly contains two operations: depthwise convolution and pointwise convolution. Pointwise convolution, that is, 1*1 convolution with groups equals 1, depthwise convolution is a specific convolution with groups equals the input channels. The optimization of mobilenet is basically the optimization of depthwise conv.

4. Although one can build depthwise convolution with ExpandConvLayer in paddle, but it's very slow， especially training process. The Gpu acceleration of mobilenet on paddle have been already realized, this will speed up the mobilenet training process  https://github.com/PaddlePaddle/Paddle/pull/2776

### Need to do
1. im2col operation are not necessary in 1 * 1 convolution. 
2. The need of cpu acceleration implementation of depthwise convolution. For ARM, neon acceleartion is also needed.
3. Fuse batch normalization on paddle.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The need of cpu acceleartion implementation of depthwise convolution in Mobilenet. #2826

Depthwise Convolution

background

Need to do

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The need of cpu acceleartion implementation of depthwise convolution in Mobilenet. #2826

Description

Depthwise Convolution

background

Need to do

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions