Skip to content

Commit 4ec96ed

Browse files
committed
release sparse4d v3
1 parent 1fd63a9 commit 4ec96ed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+4130
-1412
lines changed

README.md

Lines changed: 63 additions & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -1,109 +1,86 @@
1-
# Sparse4D
2-
**【2023/11/21】 The paper of [Sparse4Dv3](https://arxiv.org/abs/2311.11722) has been published.**
1+
<div align="center">
2+
3+
# Sparse4D: Sparse-based End-to-end Multi-view Temporal Perception
4+
</div>
35

4-
**Sparse4Dv3 is about to be released, featuring stronger detection performance and end-to-end tracking capabilities.**
5-
6-
**State-of-the-Art Performance of Sparse4Dv2 in the [nuScenes Benchmark](https://www.nuscenes.org/object-detection?externalData=all&mapData=all&modalities=Camera) for Online Models.**
6+
> [Github](https://github.com/linxuewu/Sparse4D) \
7+
> [Sparse4D v1: Multi-view 3D Object Detection with Sparse Spatial-Temporal Fusion](https://arxiv.org/abs/2211.10581) \
8+
> [Sparse4D v2: Recurrent Temporal Fusion with Sparse Model](https://arxiv.org/abs/2305.14018) \
9+
> [Sparse4D v3: Advancing End-to-End 3D Detection and Tracking](https://arxiv.org/abs/2311.11722) \
10+
> [Chinese Interpretation of the Papers](https://zhuanlan.zhihu.com/p/637096473)
711
812
## Overall Architecture
9-
### Sparse4D v2
10-
<img src="resources/sparse4dv2_framework.png" width="1000" >
13+
<center>
14+
<img style="border-radius: 0.3125em;
15+
box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);"
16+
src="resources/sparse4d_architecture.jpg" width="1000">
17+
<br>
18+
<div style="color:orange; border-bottom: 1px solid #d9d9d9;
19+
display: inline-block;
20+
color: #999;
21+
padding: 2px;">Overall Framework of Sparse4D, which conforms to an encoder-decoder structure. The inputs mainly consists of three components: multi-view images, newly initialized instances, propagated instances from previous frame. The output is the refined instances (3D anchor boxes and corresponding features), serve as the perception results for the current frame. Additionally, a subset of these refined instances is selected and propagated to the next frame.</div>
22+
</center>
23+
24+
25+
<center>
26+
<img style="border-radius: 0.3125em;
27+
box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);"
28+
src="resources/efficient_deformable_aggregation.jpg" width="1000">
29+
<br>
30+
<div style="color:orange; border-bottom: 1px solid #d9d9d9;
31+
display: inline-block;
32+
color: #999;
33+
padding: 2px;"> Illustration of our Efficient Deformable Aggregation Module. (a) The basic pipeline: we first generate multiple 3D key points inside 3D anchor, then sampling multi-scale/view image feature for each keypoint, and fuse these feature with predicted weight. (b) The parallel implementation: to further improve speed and reduce memory cost, we achieve a parallel implementation, where feature sampling and multi-view/scale weighted sum are combined as a CUDA operation. Our CUDA implementation supports handling different feature resolutions from different views. </div>
34+
</center>
1135

12-
### Sparse4D v1
13-
<img src="resources/sparse4d_framework.png" width="1000" >
14-
<!-- [video demo](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/video.avi) -->
1536

1637
## nuScenes Benchmark
17-
### Validation
38+
### Results on Validation Split
1839
These experiments were conducted using 8 RTX 3090 GPUs with 24 GB memory.
19-
|model | backbone |pretrain| img size | Epoch | Traning | FPS | NDS | mAP | config | ckpt | log |
20-
| :----: | :---: | :---: | :---: | :---: | :---:| :---:|:---:|:---: | :---: | :----: | :----: |
21-
|Sparse4D-T4 |Res101|[FCOS3D](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/fcos3d.pth)|640x1600|24|2Day5H|2.9|0.5438|0.4409|[cfg](https://github.com/linxuewu/Sparse4D/blob/main/projects/configs/sparse4d_r101_H4.py)|[ckpt](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/sparse4dv1_r101_H4_release.pth)|[log](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/sparse4d.log)|
22-
|Sparse4Dv2|Res50|[ImageNet](https://download.pytorch.org/models/resnet50-19c8e357.pth)|256x704| 100 |15H | 20.3 |0.5384|0.4392|[cfg](https://github.com/linxuewu/Sparse4D/blob/main/projects/configs/sparse4dv2_r50_HInf_256x704.py)|[ckpt](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/sparse4dv2_r50_HInf_256x704.pth)|[log](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/sparse4dv2_r50_HInf_256x704.log.json)|
23-
|Sparse4Dv2|Res101|[nuImage](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/nuimages_semseg/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim_20201009_124951-40963960.pth)|512x1408| 100 |2Day | 8.4 |0.5939|0.5051|-|-|-|
40+
|model | backbone |pretrain| img size | Epoch | Traning | FPS | NDS | mAP | AMOTA |AMOTP |IDS| config | ckpt | log |
41+
| :----: | :---: | :---: | :---: | :---: | :---:| :---:|:---:|:---: | :---: | :----: | :----: | :---: | :----: | :----: |
42+
|Sparse4D-T4 |Res101|[FCOS3D](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/fcos3d.pth)|640x1600|24|2Day5H|2.9|0.5438|0.4409|-|-|-|[cfg](https://github.com/linxuewu/Sparse4D/blob/v2.0/projects/configs/sparse4d_r101_H4.py)|[ckpt](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/sparse4dv1_r101_H4_release.pth)|[log](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/sparse4d.log)|
43+
|Sparse4Dv2|Res50|[ImageNet]()|256x704| 100 |15H | 20.3 |0.5384|0.4392|-|-|-|[cfg](https://github.com/linxuewu/Sparse4D/blob/v2.0/projects/configs/sparse4dv2_r50_HInf_256x704.py)|[ckpt](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/sparse4dv2_r50_HInf_256x704.pth)|[log](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/sparse4dv2_r50_HInf_256x704.log.json)|
44+
|Sparse4Dv2|Res101|[nuImage](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/nuimages_semseg/cascade_mask_rcnn_r101_fpn_1x_nuim/cascade_mask_rcnn_r101_fpn_1x_nuim_20201024_134804-45215b1e.pth)|512x1408| 100 |2Day | 8.4 |0.5939|0.5051|-|-|-|-|-|-|
45+
|Sparse4Dv3|Res50|[ImageNet]()|256x704| 100 |22H | 19.8 |0.5637|0.4646|0.477|1.167|456|[cfg]()|[ckpt]()|[log]()
46+
|Sparse4Dv3|Res101|[nuImage](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/nuimages_semseg/cascade_mask_rcnn_r101_fpn_1x_nuim/cascade_mask_rcnn_r101_fpn_1x_nuim_20201024_134804-45215b1e.pth)|512x1408| 100 |2Day | 8.2 |0.623|0.537|0.567|1.027|557|-|-|-|
2447

25-
### Test
26-
|model| backbone | img size | NDS | mAP |mATE| mASE | mAOE |mAVE| mAAE |
27-
| :---: | :---: | :---: | :---: | :---:|:---:|:---: | :---: | :----: | :----: |
28-
|Sparse4D-T4|Vov-99|640x1600|0.595|0.511|0.533|0.263|0.369|0.317|0.124|
29-
|Sparse4Dv2|Vov-99|640x1600|0.638|0.556|0.462|0.238|0.328|0.264|0.115|
48+
### Results on Test Split
49+
|model| backbone | img size | NDS | mAP |mATE| mASE | mAOE |mAVE| mAAE | AMOTA |AMOTP |IDS|
50+
| :---: | :---: | :---: | :---: | :---:|:---:|:---: | :---: | :----: | :----: | :----: | :----: | :----: |
51+
|Sparse4D-T4|[VoV-99](https://huggingface.co/Yuxin-CV/EVA-02/blob/main/eva02/det/eva02_L_coco_det_sys_o365.pth)|640x1600|0.595|0.511|0.533|0.263|0.369|0.317|0.124|-|-|-|
52+
|Sparse4Dv2|[VoV-99](https://huggingface.co/Yuxin-CV/EVA-02/blob/main/eva02/det/eva02_L_coco_det_sys_o365.pth)|640x1600|0.638|0.556|0.462|0.238|0.328|0.264|0.115|-|-|-|
53+
|Sparse4Dv3|[VoV-99](https://huggingface.co/Yuxin-CV/EVA-02/blob/main/eva02/det/eva02_L_coco_det_sys_o365.pth)|640x1600|0.656|0.570|0.412|0.236|0.312|0.210|0.117|0.574|0.970|669|
54+
|Sparse4Dv3-offline|[EVA02-large](https://huggingface.co/Yuxin-CV/EVA-02/blob/main/eva02/det/eva02_L_coco_det_sys_o365.pth)|640x1600|0.719|0.668|0.346|0.234|0.279|0.142|0.145|0.677|0.761|514|
3055

3156
## Quick Start
32-
Install requirements.
33-
```shell
34-
pip install -r requirements.txt
35-
cd projects/mmdet3d_plugin/ops
36-
python setup.py develop
37-
```
38-
39-
Download nuScenes dataset, pretrain checkpoint([fcos3d.pth ResNet101](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/fcos3d.pth)), pkl files([nuscenes_infos_trainval_with_inds.pkl](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/nuscenes_infos_trainval_with_inds.pkl)) and init anchor centers([nuscenes_kmeans900.npy](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/nuscenes_kmeans900.npy)). Adjust the directory structure as follows:
40-
```shell
41-
Sparse4D
42-
├── data
43-
│ ├── nuscenes
44-
│ │ ├── maps
45-
│ │ ├── lidarseg
46-
│ │ ├── samples
47-
│ │ ├── sweeps
48-
│ │ ├── v1.0-mini
49-
│ │ ├── v1.0-test
50-
| | └── v1.0-trainval
51-
│ ├── nuscenes_cam
52-
│ │ ├── nuscenes_infos_test.pkl
53-
│ │ ├── nuscenes_infos_train.pkl
54-
│ │ ├── nuscenes_infos_val.pkl
55-
│ │ └── nuscenes_infos_trainval_with_inds.pkl
56-
├── projects
57-
│   ├── configs
58-
│   │   ├── default_runtime.py
59-
│   │   ├── sparse4d_r101_H1.py
60-
│   │   ├── sparse4d_r101_H4.py
61-
│   │   └── ...
62-
│   └── mmdet3d_plugin
63-
│   ├── apis
64-
│   ├── core
65-
│   ├── datasets
66-
│   └── models
67-
├── tools
68-
│   ├── dist_test.sh
69-
│   ├── dist_train.sh
70-
│   ├── test.py
71-
│   └── train.py
72-
├── local_test.sh
73-
├── local_train.sh
74-
├── fcos3d.pth
75-
└── nuscenes_kmeans900.npy
76-
```
77-
78-
Train with config_name.py.
79-
```shell
80-
bash local_train.sh config_name
81-
```
82-
83-
Test checkpoint_file with config_name.py.
84-
```shell
85-
bash local_test.sh config_name checkpoint_file
86-
```
57+
[Quick Start](docs/quick_start.md)
8758

8859
## Citation
8960
```
61+
@misc{2311.11722,
62+
Author = {Xuewu Lin and Zixiang Pei and Tianwei Lin and Lichao Huang and Zhizhong Su},
63+
Title = {Sparse4D v3: Advancing End-to-End 3D Detection and Tracking},
64+
Year = {2023},
65+
Eprint = {arXiv:2311.11722},
66+
}
9067
@misc{2305.14018,
91-
Author = {Xuewu Lin and Tianwei Lin and Zixiang Pei and Lichao Huang and Zhizhong Su},
92-
Title = {Sparse4D v2: Recurrent Temporal Fusion with Sparse Model},
93-
Year = {2023},
94-
Eprint = {arXiv:2305.14018},
68+
Author = {Xuewu Lin and Tianwei Lin and Zixiang Pei and Lichao Huang and Zhizhong Su},
69+
Title = {Sparse4D v2: Recurrent Temporal Fusion with Sparse Model},
70+
Year = {2023},
71+
Eprint = {arXiv:2305.14018},
9572
}
96-
9773
@misc{2211.10581,
98-
Author = {Xuewu Lin and Tianwei Lin and Zixiang Pei and Lichao Huang and Zhizhong Su},
99-
Title = {Sparse4D: Multi-view 3D Object Detection with Sparse Spatial-Temporal Fusion},
100-
Year = {2022},
101-
Eprint = {arXiv:2211.10581},
74+
Author = {Xuewu Lin and Tianwei Lin and Zixiang Pei and Lichao Huang and Zhizhong Su},
75+
Title = {Sparse4D: Multi-view 3D Object Detection with Sparse Spatial-Temporal Fusion},
76+
Year = {2022},
77+
Eprint = {arXiv:2211.10581},
10278
}
10379
```
10480

10581
## Acknowledgement
10682
- [BEVFormer](https://github.com/fundamentalvision/BEVFormer)
107-
- [detr3d](https://github.com/WangYueFt/detr3d)
83+
- [DETR3D](https://github.com/WangYueFt/detr3d)
10884
- [mmdet3d](https://github.com/open-mmlab/mmdetection3d)
10985
- [SOLOFusion](https://github.com/Divadi/SOLOFusion/tree/main/configs/solofusion)
86+
- [StreamPETR](https://github.com/exiawsh/StreamPETR)

docs/quick_start.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Quick Start
2+
3+
### Set up a new virtual environment
4+
```bash
5+
virtualenv mm_sparse4d --python=python3.8
6+
source mm_sparse4d/bin/activate
7+
```
8+
9+
### Install packpages using pip3
10+
```bash
11+
sparse4d_path="path/to/sparse4d"
12+
cd ${sparse4d_path}
13+
pip3 install --upgrade pip
14+
pip3 install -r requirement.txt
15+
```
16+
17+
### Compile the deformable_aggregation CUDA op
18+
```bash
19+
cd projects/mmdet3d_plugin/ops
20+
python3 setup.py develop
21+
cd ../../../
22+
```
23+
24+
### Prepare the data
25+
Download the [NuScenes dataset](https://www.nuscenes.org/nuscenes#download) and create symbolic links.
26+
```bash
27+
cd ${sparse4d_path}
28+
mkdir data
29+
ln -s path/to/nuscenes ./data/nuscenes
30+
```
31+
32+
Pack the meta-information and labels of the dataset, and generate the required .pkl files.
33+
```bash
34+
pkl_path="data/nuscenes_anno_pkls"
35+
mkdir -p ${pkl_path}
36+
python3 tools/nuscenes_converter.py --version v1.0-mini --info_prefix ${pkl_path}/nuscenes-mini
37+
python3 tools/nuscenes_converter.py --version v1.0-trainval,v1.0-test --info_prefix ${pkl_path}/nuscenes
38+
```
39+
40+
### Generate anchors by K-means
41+
```bash
42+
python3 tools/anchor_generator.py --ann_file ${pkl_path}/nuscenes_infos_train.pkl
43+
```
44+
45+
### Download pre-trained weights
46+
Download the required backbone [pre-trained weights](https://download.pytorch.org/models/resnet50-19c8e357.pth).
47+
```bash
48+
mkdir ckpt
49+
wget https://download.pytorch.org/models/resnet50-19c8e357.pth -O ckpt/resnet50-19c8e357.pth
50+
```
51+
52+
### Commence training and testing
53+
```bash
54+
# train
55+
bash local_train.sh sparse4dv3_temporal_r50_1x8_bs6_256x704
56+
57+
# test
58+
bash local_test.sh sparse4dv3_temporal_r50_1x8_bs6_256x704 path/to/checkpoint
59+
```
60+
61+
For inference-related guidelines, please refer to the [tutorial/tutorial.ipynb](tutorial/tutorial.ipynb).

local_test.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
export PYTHONPATH=$PYTHONPATH:./
2-
export CUDA_VISIBLE_DEVICES=0,1,2,3
2+
export CUDA_VISIBLE_DEVICES=3
33
export PORT=29532
44

55
gpus=(${CUDA_VISIBLE_DEVICES//,/ })

local_train.sh

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
1-
export CUDA_VISIBLE_DEVICES=0,1,2,3
1+
export CUDA_VISIBLE_DEVICES=0
22
export PYTHONPATH=$PYTHONPATH:./
33

44
gpus=(${CUDA_VISIBLE_DEVICES//,/ })
55
gpu_num=${#gpus[@]}
66
echo "number of gpus: "${gpu_num}
77

88
config=projects/configs/$1.py
9-
checkpoint=$2
109

1110
if [ ${gpu_num} -gt 1 ]
1211
then
@@ -17,4 +16,4 @@ then
1716
else
1817
python ./tools/train.py \
1918
${config}
20-
fi
19+
fi

0 commit comments

Comments
 (0)