Framework: Vision3D is a custom, pure PyTorch-based framework designed for 3D Computer Vision tasks. The framework is modular, extensible, and has all core components for data preprocessing, training, evaluation and visualization. Additionally, it supports WandB experiments logging and exporting models to ONNX for deployment.
MonoDETR3D Model: Performs 3D object detection from single RGB image and structured Point Cloud (PC).
Key features:
- Utilizes two ResNet encoders to extract multi-scale features from RGB images and structured Point Clouds (PC).
- Leverages the
SpatiallyAwareTransformer
to efficiently fuse these modalities using CUDA-optimized Deformable Self-Attention, followed by additional attention mechanisms. See the diagram for an architectural overview. - Utilizes image features to predict a segmentation mask, encouraging the learning of semantically meaningful representations.
- The model predicts a fixed number of objects, which are matched to ground truth targets using the Hungarian algorithm.
vision3d/
├── assets
├── configs # Configuration files for models and datasets
├── docs
├── scripts # Utility scripts (training, evaluation, ONNX export)
├── vision3d # Core library
│ ├── datasets
│ ├── engine
│ ├── hooks
│ ├── losses
│ ├── metrics
│ ├── models
│ │ ├── modelling
│ │ │ ├── decoders
│ │ │ ├── encoders
│ │ │ ├── heads
│ │ │ └── utils
│ │ ├── mono_detr3d.py
│ │ ├── ops
│ └── utils
- For installing environment, please follow these instructions.
- Convert dataset in required format defined by
scripts/generate_data_splits.py
:
python scripts/generate_data_splits.py \
--dataset_root /path/to/dataset \
--output_dir /path/to/output/processed_dataset \
--val_ratio 0.2 \
--test_ratio 0.1 \
--random_seed 42
- Create / modify configuration file. Refer to
configs/
for an example. - Launch training via
scripts/train.py
:
python scripts/train.py \
--config /path/to/config.py \
--save_dir /path/to/save/logs_and_checkpoints \
--use_wandb
- Launch
scripts/eval.py
:
python scripts/eval.py \
--config /path/to/config.py \
--checkpoint /path/to/checkpoint.pth \
--eval_dir /path/to/evaluation/results \
--split test
- Launch
scripts/convert_onnx.py
:
python convert_onnx.py \
--config /path/to/config.py \
--checkpoint /path/to/checkpoint.pth \
--output_path /path/to/output/model.onnx \
--device cuda
- Once exported, Netron can be used to visualize the model. Visualization of MonoDETR3D can found here.
data_exploration.ipynb
provides interactive visualization of dataset samples.
This project is licensed under the MIT License. See the LICENSE file for details.
- Thanks to the authors of MonoDETR, from whom we adapted the
VisualEncoder
for ourSpatiallyAwareTransformer
and the custom CUDA implementation of Deformable Attention.