Skip to content

Commit fceab30

Browse files
authored
Add GFL model and PicoDet (PaddlePaddle#3620)
* add gfl model and PicoDet
1 parent 7c50785 commit fceab30

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+2854
-6
lines changed

configs/gfl/README.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Generalized Focal Loss Model(GFL)
2+
3+
## Introduction
4+
5+
[Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection](https://arxiv.org/abs/2006.04388) and [Generalized Focal Loss V2](https://arxiv.org/pdf/2011.12885.pdf)
6+
7+
8+
9+
## Model Zoo
10+
11+
| Backbone | Model | images/GPU | lr schedule |FPS | Box AP | download | config |
12+
| :-------------- | :------------- | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: |
13+
| ResNet50-FPN | GFL | 2 | 1x | ---- | 40.1 | [download](https://paddledet.bj.bcebos.com/models/gfl_r50_fpn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/gfl/gfl_r50_fpn_1x_coco.yml) |
14+
| ResNet50-FPN | GFLv2 | 2 | 1x | ---- | 40.4 | [download](https://paddledet.bj.bcebos.com/models/gflv2_r50_fpn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/gfl/gflv2_r50_fpn_1x_coco.yml) |
15+
16+
17+
**Notes:**
18+
19+
- GFL is trained on COCO train2017 dataset and evaluated on val2017 results of `mAP(IoU=0.5:0.95)`.
20+
21+
## Citations
22+
```
23+
@article{li2020generalized,
24+
title={Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection},
25+
author={Li, Xiang and Wang, Wenhai and Wu, Lijun and Chen, Shuo and Hu, Xiaolin and Li, Jun and Tang, Jinhui and Yang, Jian},
26+
journal={arXiv preprint arXiv:2006.04388},
27+
year={2020}
28+
}
29+
30+
@article{li2020gflv2,
31+
title={Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection},
32+
author={Li, Xiang and Wang, Wenhai and Hu, Xiaolin and Li, Jun and Tang, Jinhui and Yang, Jian},
33+
journal={arXiv preprint arXiv:2011.12885},
34+
year={2020}
35+
}
36+
37+
```

configs/gfl/_base_/gfl_r50_fpn.yml

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
architecture: GFL
2+
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams
3+
4+
GFL:
5+
backbone: ResNet
6+
neck: FPN
7+
head: GFLHead
8+
9+
ResNet:
10+
depth: 50
11+
variant: b
12+
norm_type: bn
13+
freeze_at: 0
14+
return_idx: [1,2,3]
15+
num_stages: 4
16+
17+
FPN:
18+
out_channel: 256
19+
spatial_scales: [0.125, 0.0625, 0.03125]
20+
extra_stage: 2
21+
has_extra_convs: true
22+
use_c5: false
23+
24+
GFLHead:
25+
conv_feat:
26+
name: FCOSFeat
27+
feat_in: 256
28+
feat_out: 256
29+
num_convs: 4
30+
norm_type: "gn"
31+
use_dcn: false
32+
fpn_stride: [8, 16, 32, 64, 128]
33+
prior_prob: 0.01
34+
reg_max: 16
35+
loss_qfl:
36+
name: QualityFocalLoss
37+
use_sigmoid: True
38+
beta: 2.0
39+
loss_weight: 1.0
40+
loss_dfl:
41+
name: DistributionFocalLoss
42+
loss_weight: 0.25
43+
loss_bbox:
44+
name: GIoULoss
45+
loss_weight: 2.0
46+
nms:
47+
name: MultiClassNMS
48+
nms_top_k: 1000
49+
keep_top_k: 100
50+
score_threshold: 0.025
51+
nms_threshold: 0.6

configs/gfl/_base_/gfl_reader.yml

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
worker_num: 2
2+
TrainReader:
3+
sample_transforms:
4+
- Decode: {}
5+
- RandomFlip: {prob: 0.5}
6+
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
7+
- Resize: {target_size: [800, 1333], keep_ratio: true, interp: 1}
8+
- Permute: {}
9+
batch_transforms:
10+
- PadBatch: {pad_to_stride: 32}
11+
- Gt2GFLTarget:
12+
downsample_ratios: [8, 16, 32, 64, 128]
13+
grid_cell_scale: 8
14+
batch_size: 2
15+
shuffle: true
16+
drop_last: true
17+
18+
19+
EvalReader:
20+
sample_transforms:
21+
- Decode: {}
22+
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
23+
- Resize: {interp: 1, target_size: [800, 1333], keep_ratio: True}
24+
- Permute: {}
25+
batch_transforms:
26+
- PadBatch: {pad_to_stride: 32}
27+
batch_size: 2
28+
shuffle: false
29+
30+
31+
TestReader:
32+
sample_transforms:
33+
- Decode: {}
34+
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
35+
- Resize: {interp: 1, target_size: [800, 1333], keep_ratio: True}
36+
- Permute: {}
37+
batch_transforms:
38+
- PadBatch: {pad_to_stride: 32}
39+
batch_size: 1
40+
shuffle: false
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
architecture: GFL
2+
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams
3+
4+
GFL:
5+
backbone: ResNet
6+
neck: FPN
7+
head: GFLHead
8+
9+
ResNet:
10+
depth: 50
11+
variant: b
12+
norm_type: bn
13+
freeze_at: 0
14+
return_idx: [1,2,3]
15+
num_stages: 4
16+
17+
FPN:
18+
out_channel: 256
19+
spatial_scales: [0.125, 0.0625, 0.03125]
20+
extra_stage: 2
21+
has_extra_convs: true
22+
use_c5: false
23+
24+
GFLHead:
25+
conv_feat:
26+
name: FCOSFeat
27+
feat_in: 256
28+
feat_out: 256
29+
num_convs: 4
30+
norm_type: "gn"
31+
use_dcn: false
32+
fpn_stride: [8, 16, 32, 64, 128]
33+
prior_prob: 0.01
34+
reg_max: 16
35+
dgqp_module:
36+
name: DGQP
37+
reg_topk: 4
38+
reg_channels: 64
39+
add_mean: True
40+
loss_qfl:
41+
name: QualityFocalLoss
42+
use_sigmoid: False
43+
beta: 2.0
44+
loss_weight: 1.0
45+
loss_dfl:
46+
name: DistributionFocalLoss
47+
loss_weight: 0.25
48+
loss_bbox:
49+
name: GIoULoss
50+
loss_weight: 2.0
51+
nms:
52+
name: MultiClassNMS
53+
nms_top_k: 1000
54+
keep_top_k: 100
55+
score_threshold: 0.025
56+
nms_threshold: 0.6
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
epoch: 12
2+
3+
LearningRate:
4+
base_lr: 0.01
5+
schedulers:
6+
- !PiecewiseDecay
7+
gamma: 0.1
8+
milestones: [8, 11]
9+
- !LinearWarmup
10+
start_factor: 0.1
11+
steps: 500
12+
13+
OptimizerBuilder:
14+
optimizer:
15+
momentum: 0.9
16+
type: Momentum
17+
regularizer:
18+
factor: 0.0001
19+
type: L2
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
_BASE_: [
2+
'../datasets/coco_detection.yml',
3+
'../runtime.yml',
4+
'_base_/gfl_r50_fpn.yml',
5+
'_base_/optimizer_1x.yml',
6+
'_base_/gfl_reader.yml',
7+
]
8+
9+
weights: output/gfl_r50_fpn_1x_coco/model_final
10+
find_unused_parameters: True
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
_BASE_: [
2+
'../datasets/coco_detection.yml',
3+
'../runtime.yml',
4+
'_base_/gflv2_r50_fpn.yml',
5+
'_base_/optimizer_1x.yml',
6+
'_base_/gfl_reader.yml',
7+
]
8+
9+
weights: output/gfl_r50_fpn_1x_coco/model_final
10+
find_unused_parameters: True

configs/picodet/README.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# PicoDet
2+
3+
## Introduction
4+
5+
We developed a series of mobile models, which named `PicoDet`.
6+
Optimizing method of we use:
7+
- [Generalized Focal Loss V2](https://arxiv.org/pdf/2011.12885.pdf)
8+
- Lr Cosine Decay
9+
10+
11+
12+
## Model Zoo
13+
14+
### PicoDet-S
15+
16+
| Backbone | Input size | images/GPU | lr schedule |Box AP | FLOPS | Inference Time | download | config |
17+
| :------------------------ | :-------: | :-------: | :-----------: | :---: | :-----: | :-----: | :-------------------------------------------------: | :-----: |
18+
| ShuffleNetv2-1x | 320*320 | 128 | 280e | 21.9 | -- | -- | [download](https://paddledet.bj.bcebos.com/models/picodet_s_shufflenetv2_320_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_s_shufflenetv2_320_coco.yml) |
19+
| MobileNetv3-large-0.5x | 320*320 | 128 | 280e | 20.4 | -- | -- | [download](https://paddledet.bj.bcebos.com/models/picodet_s_mbv3_320_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_s_mbv3_320_coco.yml) |
20+
| ShuffleNetv2-1x | 416*416 | 96 | 280e | 24.0 | -- | -- | [download](https://paddledet.bj.bcebos.com/models/picodet_s_shufflenetv2_416_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_s_shufflenetv2_416_coco.yml) |
21+
| MobileNetv3-large-0.5x | 416*416 | 96 | 280e | 23.3 | -- | -- | [download](https://paddledet.bj.bcebos.com/models/picodet_s_mbv3_416_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_s_mbv3_416_coco.yml) |
22+
23+
### PicoDet-M
24+
25+
| Backbone | Input size | images/GPU | lr schedule |Box AP | FLOPS | Inference Time | download | config |
26+
| :------------------------ | :-------: | :-------: | :-----------: | :---: | :-----: | :-----: | :-------------------------------------------------: | :-----: |
27+
| ShuffleNetv2-1.5x | 320*320 | 128 | 280e | 24.9 | -- | -- | [download](https://paddledet.bj.bcebos.com/models/picodet_m_shufflenetv2_320_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_m_shufflenetv2_320_coco.yml) |
28+
| MobileNetv3-large-1x | 320*320 | 128 | 280e | 26.4 | -- | -- | [download](https://paddledet.bj.bcebos.com/models/picodet_m_mbv3_320_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_m_mbv3_320_coco.yml) |
29+
| ShuffleNetv2-1.5x | 416*416 | 128 | 280e | 27.4 | -- | -- | [download](https://paddledet.bj.bcebos.com/models/picodet_m_shufflenetv2_416_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_m_shufflenetv2_416_coco.yml) |
30+
| MobileNetv3-large-1x | 416*416 | 128 | 280e | 29.2 | -- | -- | [download](https://paddledet.bj.bcebos.com/models/picodet_m_mbv3_416_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_m_mbv3_416_coco.yml) |
31+
32+
33+
**Notes:**
34+
35+
- PicoDet inference speed is tested on Kirin 980 with 4 threads by arm8 and with FP16.
36+
- PicoDet is trained on COCO train2017 dataset and evaluated on val2017 results of `mAP(IoU=0.5:0.95)`.
37+
- PicoDet used 4 GPUs for training and mini-batch size as 128 or 96 on each GPU.
38+
39+
## Citations
40+
```
41+
@article{li2020gflv2,
42+
title={Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection},
43+
author={Li, Xiang and Wang, Wenhai and Hu, Xiaolin and Li, Jun and Tang, Jinhui and Yang, Jian},
44+
journal={arXiv preprint arXiv:2011.12885},
45+
year={2020}
46+
}
47+
48+
```
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
epoch: 280
2+
3+
LearningRate:
4+
base_lr: 0.4
5+
schedulers:
6+
- !CosineDecay
7+
max_epochs: 280
8+
- !LinearWarmup
9+
start_factor: 0.1
10+
steps: 300
11+
12+
OptimizerBuilder:
13+
optimizer:
14+
momentum: 0.9
15+
type: Momentum
16+
regularizer:
17+
factor: 0.0001
18+
type: L2
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
worker_num: 6
2+
TrainReader:
3+
sample_transforms:
4+
- Decode: {}
5+
- RandomDistort: {}
6+
- RandomCrop: {}
7+
- RandomFlip: {prob: 0.5}
8+
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
9+
- Resize: {target_size: [320, 320], keep_ratio: False, interp: 1}
10+
- Permute: {}
11+
batch_transforms:
12+
- PadBatch: {pad_to_stride: 32}
13+
- Gt2GFLTarget:
14+
downsample_ratios: [8, 16, 32]
15+
grid_cell_scale: 5
16+
cell_offset: 0.5
17+
batch_size: 128
18+
shuffle: true
19+
drop_last: true
20+
21+
22+
EvalReader:
23+
sample_transforms:
24+
- Decode: {}
25+
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
26+
- Resize: {interp: 1, target_size: [320, 320], keep_ratio: False}
27+
- Permute: {}
28+
batch_transforms:
29+
- PadBatch: {pad_to_stride: 32}
30+
batch_size: 8
31+
shuffle: false
32+
33+
34+
TestReader:
35+
inputs_def:
36+
image_shape: [3, 320, 320]
37+
sample_transforms:
38+
- Decode: {}
39+
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
40+
- Resize: {interp: 1, target_size: [320, 320], keep_ratio: False}
41+
- Permute: {}
42+
batch_transforms:
43+
- PadBatch: {pad_to_stride: 32}
44+
batch_size: 1
45+
shuffle: false

0 commit comments

Comments
 (0)