UniM2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving
This is the official PyTorch implementation of the paper - UniM2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving.
We provide our pretrained weights. You can load the pretrained UniM2AE(UniM2AE for BEVFusion and UniM2AE-sst-pre for SST) to train the multi-modal detector(BEVFusion) or the LiDAR-only detector(SST).
| Model | Modality | Checkpoint |
|---|---|---|
| UniM2AE | C+L | Link |
| UniM2AE-sst-pre | L | Link |
| swint-nuImages | C | Link |
Note: The checkpoint(denoted as swint-nuImages) pretrained on nuImages is provided by BEVFusion.
| Model | Modality | mAP | NDS | Checkpoint |
|---|---|---|---|---|
| TransFusion-L-SST | L | 65.0 | 69.9 | Link |
| UniM2AE-L | L | 65.7 | 70.4 | Link |
| BEVFusion-SST | C+L | 68.2 | 71.5 | Link |
| UniM2AE | C+L | 68.4 | 71.9 | Link |
| UniM2AE w/MMIM | C+L | 69.7 | 72.7 | Link |
| Model | Modality | mAP | NDS |
|---|---|---|---|
| UniM2AE-L | L | 67.9 | 72.2 |
| UniM2AE | C+L | 70.3 | 73.3 |
Here, we train the UniM2AE-L and the UniM2AE on the trainval split of the nuScenes dataset and test them without any test time augmentation.
| Model | Modality | mIoU | Checkpoint |
|---|---|---|---|
| BEVFusion | C | 51.2 | Link |
| UniM2AE | C | 52.9 | Link |
| BEVFusion-SST | C+L | 61.3 | Link |
| UniM2AE | C+L | 61.4 | Link |
| UniM2AE w/MMIM | C+L | 67.8 | Link |
- Python == 3.8
- mmcv-full == 1.4.0
- mmdetection = 2.14.0
- torch == 1.9.1+cu111
- torchvision == 0.10.1+cu111
- numpy == 1.19.5
- matplotlib == 3.6.2
- pyquaternion == 0.9.9
- scikit-learn == 1.1.3
- setuptools == 59.5.0
After installing these dependencies, please run this command to install the codebase:
cd Pretrain
python setup.py developThe code of Fine-tuning are built with different libraries. Please refer to BEVFusion and Voxel-MAE.
We follow the instructions from here to download the nuScenes dataset. Please remember to download both detection dataset and the map extension for BEV map segmentation.
After downloading the nuScenes dataset, please preprocess the nuScenes dataset by:
cd Finetune/bevfusion/
python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenesand create the soft link in Pretrain/data, Finetune/sst/data with ln -s.
After data preparation, the directory structure is as follows:
UniM2AE
├──Finetune
│ ├──bevfusion
│ │ ├──tools
│ │ ├──configs
│ │ ├──data
│ │ │ ├── can_bus
│ │ │ │ ├── ...
│ │ │ ├──nuscenes
│ │ │ │ ├── maps
│ │ │ │ ├── samples
│ │ │ │ ├── sweeps
│ │ │ │ ├── v1.0-test
│ │ | | ├── v1.0-trainval
│ │ │ │ ├── nuscenes_database
│ │ │ │ ├── nuscenes_infos_train.pkl
│ │ │ │ ├── nuscenes_infos_val.pkl
│ │ │ │ ├── nuscenes_infos_test.pkl
│ │ │ │ ├── nuscenes_dbinfos_train.pkl
│ ├──sst
│ │ ├──data
│ │ │ ├──nuscenes
│ │ │ │ ├── ...
├──Pretrain
│ ├──mmdet3d
│ ├──tools
│ ├──configs
│ ├──data
│ │ ├── can_bus
│ │ │ ├── ...
│ │ ├──nuscenes
│ │ │ ├── ...
Please run:
cd Pretrain
bash tools/dist_train.sh configs/unim2ae_mmim.py 8 and run the script for fine-tuning:
cd Pretrain
python tools/convert.py --source work_dirs/unim2ae_mmim/epoch_200.pth --target ../Finetune/bevfusion/pretrained/unim2ae-pre.pthTo get the reconstruction results of the images and the LiDAR point cloud, please run:
cd Pretrain
python tools/test.py configs/unim2ae_mmim.py --checkpoint [pretrain checkpoint path] --show-pretrain --show-dir vizWe provide instructions to finetune BEVFusion and Voxel-MAE.
If you want to train the LiDAR-only UniM2AE-L for object detection, please run:
cd Finetune/bevfusion
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/lidar/sstv2.yaml --load_from pretrained/unim2ae-lidar-only-pre.pthFor UniM2AE w/MMIM detection model, please run:
cd Finetune/bevfusion
python tools/convert.py --source [lidar-only UniM2AE-L checkpoint file path] --fuser pretrained/unim2ae-pre.pth --target pretrained/unim2ae-stage1.pth --stage2
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/unim2ae_MMIM.yaml --load_from pretrained/unim2ae-stage1.pthIf you want to init the camera backbone with weight pretrained on nuImages, please run:
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/unim2ae_MMIM.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/unim2ae-stage1-L.pthFor UniM2AE detection model, please run:
cd Finetune/bevfusion
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/bevfusion_sst.yaml --load_from pretrained/unim2ae-stage1.pthIf you want to init the camera backbone with weight pretrained on nuImages, please run:
cd Finetune/bevfusion
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/bevfusion_sst.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/unim2ae-L-det.pthNote: The unim2ae-L.pth is the training results of the LiDAR-only UniM2AE-L for object detection.
For camera-only UniM2AE segmentation model, please run:
cd Finetune/bevfusion
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/seg/camera-bev256d2.yaml --load_from pretrained/unim2ae-seg-c-pre.pthFor UniM2AE segmentation model, please run:
cd Finetune/bevfusion
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/seg/fusion-sst.yaml --load_from pretrained/unim2ae-pre.pthIf you want to init the camera backbone with weight pretrained on nuImages, please run:
cd Finetune/bevfusion
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/seg/fusion-sst.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/unim2ae-seg-pre.pthFor UniM2AE w/MMIM segmentation model, please run:
cd Finetune/bevfusion
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/seg/unim2ae_MMIM.yaml --load_from pretrained/unim2ae-pre.pthIf you want to init the camera backbone with weight pretrained on nuImages, please run:
cd Finetune/bevfusion
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/seg/unim2ae_MMIM.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/unim2ae-seg-pre.pthPlease run:
cd Finetune/bevfusion
torchpack dist-run -np 8 python tools/test.py [config file path] pretrained/[checkpoint name].pth --eval [evaluation type]For example, if you want to evaluate the detection model, please run:
cd Finetune/bevfusion
torchpack dist-run -np 8 python tools/test.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/unim2ae_MMIM.yaml pretrained/unim2ae-mmim-det.pth --eval bboxIf you want to evaluate the segmentation model, please run:
cd Finetune/bevfusion
torchpack dist-run -np 8 python tools/test.py configs/nuscenes/seg/unim2ae_MMIM.yaml pretrained/unim2ae-mmim-seg.pth --eval mapTo train the LiDAR-only anchor-based detector, please run
cd Finetune/sst
bash tools/dist_train.sh configs/sst_refactor/sst_10sweeps_VS0.5_WS16_ED8_epochs288_intensity.py 8 --cfg-options 'load_from=pretrained/unim2ae-sst-pre.pth'To evaluate the LiDAR-only anchor-based detector, please run
cd Finetune/sst
bash tools/dist_train.sh configs/sst_refactor/sst_10sweeps_VS0.5_WS16_ED8_epochs288_intensity.py [checkpoint file path] 8UniM2AE is based on mmdetection3d. This repository is also inspired by the following outstanding contributions to the open-source community: 3DETR, BEVFormer, DETR, BEVFusion, MAE, Voxel-MAE, GreenMIM, SST, TransFusion.
If you find UniM2AE is helpful to your research, please consider citing our work:
@article{zou2023unim,
title={UniM$^2$AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving},
author={Zou, Jian and Huang, Tianyu and Yang, Guanglei and Guo, Zhenhua and Zuo, Wangmeng},
journal={arXiv preprint arXiv:2308.10421},
year={2023}
}
