HorizonRobotics
diff --git a/‎README.md‎
Lines changed: 63 additions & 86 deletions b/‎README.md‎
Lines changed: 63 additions & 86 deletions
diff --git a/‎docs/quick_start.md‎
Lines changed: 61 additions & 0 deletions b/‎docs/quick_start.md‎
Lines changed: 61 additions & 0 deletions
diff --git a/‎local_test.sh‎
Lines changed: 1 addition & 1 deletion b/‎local_test.sh‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎local_train.sh‎
Lines changed: 2 additions & 3 deletions b/‎local_train.sh‎
Lines changed: 2 additions & 3 deletions
@@ -1,109 +1,86 @@
-# Sparse4D
-**【2023/11/21】 The paper of [Sparse4Dv3](https://arxiv.org/abs/2311.11722) has been published.**
+<div align="center">   
+  
+# Sparse4D: Sparse-based End-to-end Multi-view Temporal Perception
+</div>
 
-**Sparse4Dv3 is about to be released, featuring stronger detection performance and end-to-end tracking capabilities.**
-
-**State-of-the-Art Performance of Sparse4Dv2 in the [nuScenes Benchmark](https://www.nuscenes.org/object-detection?externalData=all&mapData=all&modalities=Camera) for Online Models.**
+> [Github](https://github.com/linxuewu/Sparse4D) \
+> [Sparse4D v1: Multi-view 3D Object Detection with Sparse Spatial-Temporal Fusion](https://arxiv.org/abs/2211.10581) \
+> [Sparse4D v2: Recurrent Temporal Fusion with Sparse Model](https://arxiv.org/abs/2305.14018) \
+> [Sparse4D v3: Advancing End-to-End 3D Detection and Tracking](https://arxiv.org/abs/2311.11722) \
+> [Chinese Interpretation of the Papers](https://zhuanlan.zhihu.com/p/637096473)
 
 ## Overall Architecture
-### Sparse4D v2
-<img src="resources/sparse4dv2_framework.png" width="1000" >
+<center>
+    <img style="border-radius: 0.3125em;
+    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);" 
+    src="resources/sparse4d_architecture.jpg" width="1000">
+    <br>
+    <div style="color:orange; border-bottom: 1px solid #d9d9d9;
+    display: inline-block;
+    color: #999;
+    padding: 2px;">Overall Framework of Sparse4D, which conforms to an encoder-decoder structure. The inputs mainly consists of three components: multi-view images, newly initialized instances, propagated instances from previous frame. The output is the refined instances (3D anchor boxes and corresponding features), serve as the perception results for the current frame. Additionally, a subset of these refined instances is selected and propagated to the next frame.</div>
+</center>
+
+
+<center>
+    <img style="border-radius: 0.3125em;
+    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);" 
+    src="resources/efficient_deformable_aggregation.jpg" width="1000">
+    <br>
+    <div style="color:orange; border-bottom: 1px solid #d9d9d9;
+    display: inline-block;
+    color: #999;
+    padding: 2px;"> Illustration of our Efficient Deformable Aggregation Module. (a) The basic pipeline: we first generate multiple 3D key points inside 3D anchor, then sampling multi-scale/view image feature for each keypoint, and fuse these feature with predicted weight. (b) The parallel implementation: to further improve speed and reduce memory cost, we achieve a parallel implementation, where feature sampling and multi-view/scale weighted sum are combined as a CUDA operation. Our CUDA implementation supports handling different feature resolutions from different views. </div>
+</center>
 
-### Sparse4D v1
-<img src="resources/sparse4d_framework.png" width="1000" >
-<!-- [video demo](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/video.avi) -->
 
 ## nuScenes Benchmark
-### Validation
+### Results on Validation Split
 These experiments were conducted using 8 RTX 3090 GPUs with 24 GB memory.
-|model | backbone |pretrain| img size | Epoch | Traning | FPS | NDS | mAP | config | ckpt | log |
-|  :----:  | :---: | :---: | :---: | :---: | :---:| :---:|:---:|:---: | :---: | :----: | :----: |
-|Sparse4D-T4 |Res101|[FCOS3D](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/fcos3d.pth)|640x1600|24|2Day5H|2.9|0.5438|0.4409|[cfg](https://github.com/linxuewu/Sparse4D/blob/main/projects/configs/sparse4d_r101_H4.py)|[ckpt](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/sparse4dv1_r101_H4_release.pth)|[log](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/sparse4d.log)|
-|Sparse4Dv2|Res50|[ImageNet](https://download.pytorch.org/models/resnet50-19c8e357.pth)|256x704| 100 |15H | 20.3 |0.5384|0.4392|[cfg](https://github.com/linxuewu/Sparse4D/blob/main/projects/configs/sparse4dv2_r50_HInf_256x704.py)|[ckpt](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/sparse4dv2_r50_HInf_256x704.pth)|[log](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/sparse4dv2_r50_HInf_256x704.log.json)|
-|Sparse4Dv2|Res101|[nuImage](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/nuimages_semseg/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim_20201009_124951-40963960.pth)|512x1408| 100 |2Day | 8.4 |0.5939|0.5051|-|-|-|
+|model | backbone |pretrain| img size | Epoch | Traning | FPS | NDS | mAP |  AMOTA |AMOTP |IDS| config | ckpt | log |
+|  :----:  | :---: | :---: | :---: | :---: | :---:| :---:|:---:|:---: | :---: | :----: | :----: | :---: | :----: | :----: |
+|Sparse4D-T4 |Res101|[FCOS3D](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/fcos3d.pth)|640x1600|24|2Day5H|2.9|0.5438|0.4409|-|-|-|[cfg](https://github.com/linxuewu/Sparse4D/blob/v2.0/projects/configs/sparse4d_r101_H4.py)|[ckpt](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/sparse4dv1_r101_H4_release.pth)|[log](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/sparse4d.log)|
+|Sparse4Dv2|Res50|[ImageNet]()|256x704| 100 |15H | 20.3 |0.5384|0.4392|-|-|-|[cfg](https://github.com/linxuewu/Sparse4D/blob/v2.0/projects/configs/sparse4dv2_r50_HInf_256x704.py)|[ckpt](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/sparse4dv2_r50_HInf_256x704.pth)|[log](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/sparse4dv2_r50_HInf_256x704.log.json)|
+|Sparse4Dv2|Res101|[nuImage](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/nuimages_semseg/cascade_mask_rcnn_r101_fpn_1x_nuim/cascade_mask_rcnn_r101_fpn_1x_nuim_20201024_134804-45215b1e.pth)|512x1408| 100 |2Day | 8.4 |0.5939|0.5051|-|-|-|-|-|-|
+|Sparse4Dv3|Res50|[ImageNet]()|256x704| 100 |22H | 19.8 |0.5637|0.4646|0.477|1.167|456|[cfg]()|[ckpt]()|[log]()
+|Sparse4Dv3|Res101|[nuImage](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/nuimages_semseg/cascade_mask_rcnn_r101_fpn_1x_nuim/cascade_mask_rcnn_r101_fpn_1x_nuim_20201024_134804-45215b1e.pth)|512x1408| 100 |2Day | 8.2 |0.623|0.537|0.567|1.027|557|-|-|-|
 
-### Test
-|model| backbone | img size | NDS | mAP |mATE| mASE | mAOE |mAVE| mAAE |
-| :---: | :---: | :---: | :---: | :---:|:---:|:---: | :---: | :----: | :----: |
-|Sparse4D-T4|Vov-99|640x1600|0.595|0.511|0.533|0.263|0.369|0.317|0.124|
-|Sparse4Dv2|Vov-99|640x1600|0.638|0.556|0.462|0.238|0.328|0.264|0.115|
+### Results on Test Split
+|model| backbone | img size | NDS | mAP |mATE| mASE | mAOE |mAVE| mAAE | AMOTA |AMOTP |IDS|
+| :---: | :---: | :---: | :---: | :---:|:---:|:---: | :---: | :----: | :----: | :----: | :----: | :----: |
+|Sparse4D-T4|[VoV-99](https://huggingface.co/Yuxin-CV/EVA-02/blob/main/eva02/det/eva02_L_coco_det_sys_o365.pth)|640x1600|0.595|0.511|0.533|0.263|0.369|0.317|0.124|-|-|-|
+|Sparse4Dv2|[VoV-99](https://huggingface.co/Yuxin-CV/EVA-02/blob/main/eva02/det/eva02_L_coco_det_sys_o365.pth)|640x1600|0.638|0.556|0.462|0.238|0.328|0.264|0.115|-|-|-|
+|Sparse4Dv3|[VoV-99](https://huggingface.co/Yuxin-CV/EVA-02/blob/main/eva02/det/eva02_L_coco_det_sys_o365.pth)|640x1600|0.656|0.570|0.412|0.236|0.312|0.210|0.117|0.574|0.970|669|
+|Sparse4Dv3-offline|[EVA02-large](https://huggingface.co/Yuxin-CV/EVA-02/blob/main/eva02/det/eva02_L_coco_det_sys_o365.pth)|640x1600|0.719|0.668|0.346|0.234|0.279|0.142|0.145|0.677|0.761|514|
 
 ## Quick Start
- Install requirements.
-```shell
-pip install -r requirements.txt
-cd projects/mmdet3d_plugin/ops
-python setup.py develop
-```
-
-Download nuScenes dataset, pretrain checkpoint([fcos3d.pth ResNet101](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/fcos3d.pth)), pkl files([nuscenes_infos_trainval_with_inds.pkl](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/nuscenes_infos_trainval_with_inds.pkl)) and init anchor centers([nuscenes_kmeans900.npy](https://github.com/linxuewu/Sparse4D/releases/download/v0.0/nuscenes_kmeans900.npy)). Adjust the directory structure as follows:
-```shell
-Sparse4D
-├── data
-│   ├── nuscenes
-│   │   ├── maps
-│   │   ├── lidarseg
-│   │   ├── samples
-│   │   ├── sweeps
-│   │   ├── v1.0-mini
-│   │   ├── v1.0-test
-|   |   └── v1.0-trainval
-│   ├── nuscenes_cam
-│   │   ├── nuscenes_infos_test.pkl
-│   │   ├── nuscenes_infos_train.pkl
-│   │   ├── nuscenes_infos_val.pkl
-│   │   └── nuscenes_infos_trainval_with_inds.pkl
-├── projects
-│   ├── configs
-│   │   ├── default_runtime.py
-│   │   ├── sparse4d_r101_H1.py
-│   │   ├── sparse4d_r101_H4.py
-│   │   └── ...
-│   └── mmdet3d_plugin
-│       ├── apis
-│       ├── core
-│       ├── datasets
-│       └── models
-├── tools
-│   ├── dist_test.sh
-│   ├── dist_train.sh
-│   ├── test.py
-│   └── train.py
-├── local_test.sh
-├── local_train.sh
-├── fcos3d.pth
-└── nuscenes_kmeans900.npy
-```
-
-Train with config_name.py.
-```shell
-bash local_train.sh config_name
-```
-
-Test checkpoint_file with config_name.py.
-```shell
-bash local_test.sh config_name checkpoint_file 
-```
+[Quick Start](docs/quick_start.md)
 
 ## Citation
 ```
+@misc{2311.11722,
+    Author = {Xuewu Lin and Zixiang Pei and Tianwei Lin and Lichao Huang and Zhizhong Su},
+    Title = {Sparse4D v3: Advancing End-to-End 3D Detection and Tracking},
+    Year = {2023},
+    Eprint = {arXiv:2311.11722},
+}
 @misc{2305.14018,
-Author = {Xuewu Lin and Tianwei Lin and Zixiang Pei and Lichao Huang and Zhizhong Su},
-Title = {Sparse4D v2: Recurrent Temporal Fusion with Sparse Model},
-Year = {2023},
-Eprint = {arXiv:2305.14018},
+    Author = {Xuewu Lin and Tianwei Lin and Zixiang Pei and Lichao Huang and Zhizhong Su},
+    Title = {Sparse4D v2: Recurrent Temporal Fusion with Sparse Model},
+    Year = {2023},
+    Eprint = {arXiv:2305.14018},
 }
-
 @misc{2211.10581,
-  Author = {Xuewu Lin and Tianwei Lin and Zixiang Pei and Lichao Huang and Zhizhong Su},
-  Title = {Sparse4D: Multi-view 3D Object Detection with Sparse Spatial-Temporal Fusion},
-  Year = {2022},
-  Eprint = {arXiv:2211.10581},
+    Author = {Xuewu Lin and Tianwei Lin and Zixiang Pei and Lichao Huang and Zhizhong Su},
+    Title = {Sparse4D: Multi-view 3D Object Detection with Sparse Spatial-Temporal Fusion},
+    Year = {2022},
+    Eprint = {arXiv:2211.10581},
 }
 ```
 
 ## Acknowledgement
 - [BEVFormer](https://github.com/fundamentalvision/BEVFormer)
-- [detr3d](https://github.com/WangYueFt/detr3d) 
+- [DETR3D](https://github.com/WangYueFt/detr3d) 
 - [mmdet3d](https://github.com/open-mmlab/mmdetection3d)
 - [SOLOFusion](https://github.com/Divadi/SOLOFusion/tree/main/configs/solofusion)
+- [StreamPETR](https://github.com/exiawsh/StreamPETR)
@@ -0,0 +1,61 @@
+# Quick Start
+
+### Set up a new virtual environment
+```bash
+virtualenv mm_sparse4d --python=python3.8
+source mm_sparse4d/bin/activate
+```
+
+### Install packpages using pip3
+```bash
+sparse4d_path="path/to/sparse4d"
+cd ${sparse4d_path}
+pip3 install --upgrade pip
+pip3 install -r requirement.txt
+```
+
+### Compile the deformable_aggregation CUDA op
+```bash
+cd projects/mmdet3d_plugin/ops
+python3 setup.py develop
+cd ../../../
+```
+
+### Prepare the data
+Download the [NuScenes dataset](https://www.nuscenes.org/nuscenes#download) and create symbolic links.
+```bash
+cd ${sparse4d_path}
+mkdir data
+ln -s path/to/nuscenes ./data/nuscenes
+```
+
+Pack the meta-information and labels of the dataset, and generate the required .pkl files.
+```bash
+pkl_path="data/nuscenes_anno_pkls"
+mkdir -p ${pkl_path}
+python3 tools/nuscenes_converter.py --version v1.0-mini --info_prefix ${pkl_path}/nuscenes-mini
+python3 tools/nuscenes_converter.py --version v1.0-trainval,v1.0-test --info_prefix ${pkl_path}/nuscenes
+```
+
+### Generate anchors by K-means
+```bash
+python3 tools/anchor_generator.py --ann_file ${pkl_path}/nuscenes_infos_train.pkl
+```
+
+### Download pre-trained weights
+Download the required backbone [pre-trained weights](https://download.pytorch.org/models/resnet50-19c8e357.pth).
+```bash
+mkdir ckpt
+wget https://download.pytorch.org/models/resnet50-19c8e357.pth -O ckpt/resnet50-19c8e357.pth
+```
+
+### Commence training and testing
+```bash
+# train
+bash local_train.sh sparse4dv3_temporal_r50_1x8_bs6_256x704
+
+# test
+bash local_test.sh sparse4dv3_temporal_r50_1x8_bs6_256x704  path/to/checkpoint
+```
+
+For inference-related guidelines, please refer to the [tutorial/tutorial.ipynb](tutorial/tutorial.ipynb).
@@ -1,5 +1,5 @@
 export PYTHONPATH=$PYTHONPATH:./
-export CUDA_VISIBLE_DEVICES=0,1,2,3
+export CUDA_VISIBLE_DEVICES=3
 export PORT=29532
 
 gpus=(${CUDA_VISIBLE_DEVICES//,/ })
 
@@ -1,12 +1,11 @@
-export CUDA_VISIBLE_DEVICES=0,1,2,3
+export CUDA_VISIBLE_DEVICES=0
 export PYTHONPATH=$PYTHONPATH:./
 
 gpus=(${CUDA_VISIBLE_DEVICES//,/ })
 gpu_num=${#gpus[@]}
 echo "number of gpus: "${gpu_num}
 
 config=projects/configs/$1.py
-checkpoint=$2
 
 if [ ${gpu_num} -gt 1 ]
 then
@@ -17,4 +16,4 @@ then
 else
     python ./tools/train.py \
         ${config}
-fi
+fi