The pipeline of auto labeling for 3D detection.
- Support priority: Tier S
graph LR
NADATA[(non-annotated T4Dataset)]
subgraph "Model A inference"
INFERENCE_A[create_info]
end
subgraph "Model B inference"
INFERENCE_B[create_info]
end
subgraph "Model C inference"
INFERENCE_C[create_info]
end
subgraph "Ensemble"
ENSEMBLE[filter_objects]
end
subgraph "Temporal ID Consistency"
TRACKING[attach_tracking_id]
end
subgraph "Convert to T4Dataset"
CONVERT[create_pseudo_dataset]
end
DATA[(auto-labeled T4Dataset)]
NADATA --> INFERENCE_A
NADATA --> INFERENCE_B
NADATA --> INFERENCE_C
INFERENCE_A --> ENSEMBLE
INFERENCE_B --> ENSEMBLE
INFERENCE_C --> ENSEMBLE
ENSEMBLE --> TRACKING
TRACKING --> CONVERT
CONVERT --> DATA
click INFERENCE_A "https://github.com/tier4/AWML/tree/main/tools/auto_labeling_3d#step-31-create-info-file-from-non-annotated-t4dataset"
click INFERENCE_B "https://github.com/tier4/AWML/tree/main/tools/auto_labeling_3d#step-31-create-info-file-from-non-annotated-t4dataset"
click INFERENCE_C "https://github.com/tier4/AWML/tree/main/tools/auto_labeling_3d#step-31-create-info-file-from-non-annotated-t4dataset"
click ENSEMBLE "https://github.com/tier4/AWML/tree/main/tools/auto_labeling_3d#step-32-filter-and-ensemble-results"
click TRACKING "https://github.com/tier4/AWML/tree/main/tools/auto_labeling_3d#step-33-attach-tracking-ids"
click CONVERT "https://github.com/tier4/AWML/tree/main/tools/auto_labeling_3d#step-34-create-the-auto-labeled-t4dataset"
- Please follow the installation tutorial to set up the environment.
- In addition, please follow the below setting up procedure.
- Build docker image.
- If you build
AWMLimage locally, please add--build-arg BASE_IMAGE=awmlor--build-arg BASE_IMAGE=awml-ros2to build script.
- If you build
DOCKER_BUILDKIT=1 docker build -t auto_labeling_3d -f tools/auto_labeling_3d/Dockerfile .- Run docker container.
docker run -it --gpus '"device=0"' --name auto_labeling_3d --shm-size=64g -d -v {path to autoware-ml}:/workspace -v {path to data}:/workspace/data auto_labeling_3d bash- If you want to use these models in auto labeling, please follow the setting up procedure in the README of each model:
Prepare your non-annotated T4dataset in the following structure:
- data/t4dataset/
- pseudo_xx1/
- scene_0/
- annotation/
- ..
- data/
- LIDAR_CONCAT/
- CAM_*/
- ..
- ...
- scene_1/
- ..
You have two options to run the pipeline:
For most users, use launch.py to run the entire pipeline in one command:
python tools/auto_labeling_3d/entrypoint/launch.py tools/auto_labeling_3d/entrypoint/configs/example.yamlThis executes all steps automatically:
- Download model checkpoints from Model Zoo
- Run inference and create info files with pseudo labels
- Ensemble/filter results from multiple models
- Attach consistent tracking IDs across frames
- Generate final auto-labeled T4Dataset
- Restructure directory format
See example.yaml and update paths for your workspace.
For advanced users who need granular control or want to customize the pipeline, you can run each step separately:
Step 3.1: Create info file from non-annotated T4dataset
Run inference with a 3D detection model to generate info files:
python tools/auto_labeling_3d/create_info_data/create_info_data.py --root-path {path to directory of non-annotated T4dataset} --out-dir {path to output} --config {model config file to use auto labeling} --ckpt {checkpoint file}- For example, run the following command
python tools/auto_labeling_3d/create_info_data/create_info_data.py --root-path ./data/t4dataset/pseudo_xx1 --out-dir ./data/t4dataset/info --config projects/BEVFusion/configs/t4dataset/bevfusion_lidar_voxel_second_secfpn_1xb1_t4offline.py --ckpt ./work_dirs/bevfusion_offline/epoch_20.pth- If you want to ensemble for auto labeling, you should create info files for each model.
- As a result, the data is as below
- data/t4dataset/
- pseudo_xx1/
- scene_0/
- annotation/
- ..
- data/
- ...
- scene_1/
- ..
- info/
- pseudo_infos_raw_centerpoint.pkl
- pseudo_infos_raw_bevfusion.pkl
Step 3.2: Filter and ensemble results
- Set a config to decide what you want to filter
- Set threshold to filter objects with low confidence
centerpoint_pipeline = [
dict(
type="ThresholdFilter",
confidence_thresholds={
"car": 0.35,
"truck": 0.35,
"bus": 0.35,
"bicycle": 0.35,
"pedestrian": 0.35,
},
use_label=["car", "truck", "bus", "bicycle", "pedestrian"],
),
]
filter_pipelines = dict(
type="Filter",
input=dict(
name="centerpoint",
info_path="./data/t4dataset/info/pseudo_infos_raw_centerpoint.pkl",
filter_pipeline=centerpoint_pipeline,
),
)- Make the info file to filter the objects which do not use for auto-labeled T4Dataset
python tools/auto_labeling_3d/filter_objects/filter_objects.py --config {config_file} --work-dir {path to output}- If you want to ensemble model, you set a config as below.
centerpoint_pipeline = [
dict(
type="ThresholdFilter",
confidence_thresholds={
"car": 0.35,
"truck": 0.35,
"bus": 0.35,
"bicycle": 0.35,
"pedestrian": 0.35,
},
use_label=["car", "truck", "bus", "bicycle", "pedestrian"],
),
]
bevfusion_pipeline = [
dict(
type="ThresholdFilter",
confidence_thresholds={
"bicycle": 0.35,
"pedestrian": 0.35,
},
use_label=["bicycle", "pedestrian"],
),
]
filter_pipelines = dict(
type="Ensemble",
config=dict(
type="NMSEnsembleModel",
ensemble_setting=dict(
weights=[1.0, 1.0],
iou_threshold=0.55,
),
),
inputs=[
dict(
name="centerpoint",
info_path="./data/t4dataset/info/pseudo_infos_raw_centerpoint.pkl",
filter_pipeline=centerpoint_pipeline,
),
dict(
name="bevfusion",
info_path="./data/t4dataset/info/pseudo_infos_raw_bevfusion.pkl",
filter_pipeline=bevfusion_pipeline,
),
],
)- Make the info file to filter the objects which do not use for auto-labeled T4Dataset and ensemble filtered results.
python tools/auto_labeling_3d/filter_objects/ensemble_infos.py --config {config_file} --work-dir {path to output}- As a result, the data is as below
- data/t4dataset/
- pseudo_xx1/
- scene_0/
- annotation/
- ..
- data/
- ...
- scene_1/
- ..
- info/
- pseudo_infos_raw_centerpoint.pkl
- pseudo_infos_raw_bevfusion.pkl
- pseudo_infos_filtered.pkl
Step 3.3: Attach tracking IDs
- Attach tracking IDs to maintain temporal consistency:
- If you do not use for target annotation, you can skip this section.
python tools/auto_labeling_3d/attach_tracking_id/attach_tracking_id.py --input {info file} --output {info_file}- As a result, an info file is made as below.
- data/t4dataset/
- pseudo_xx1/
- scene_0/
- annotation/
- ..
- data/
- ...
- scene_1/
- ..
- info/
- pseudo_infos_raw_centerpoint.pkl
- pseudo_infos_raw_bevfusion.pkl
- pseudo_infos_filtered.pkl
- pseudo_infos_tracked.pkl
Step 3.4: Create the auto-labeled T4Dataset
Generate the auto-labeled T4Dataset:
python tools/auto_labeling_3d/create_pseudo_t4dataset/create_pseudo_t4dataset.py {yaml config file about T4dataset data} --root-path {path to directory of non-annotated T4dataset} --input {path to pkl file}- As a result, auto-labeled T4Dataset is made as below.
- data/t4dataset/
- pseudo_xx1/
- scene_0/
- annotation/
- sample.json
- ..
- scene_1/
- ..
- ..
Before using the auto-labeled T4Dataset for training, you can visualize and verify the generated labels using t4-devkit.
Please refer to t4-devkit render tutorial for visualization instructions.
Please upload auto-labeled T4Dataset to WebAuto to share easily for other users.
Please check Web.Auto document for the detail.
To align T4dataset directory structure, you run the script as following.
python tools/auto_labeling_3d/change_directory_structure/change_directory_structure.py --dataset_dir data/t4dataset/pseudo_xx1/The result of the structure of auto-labeled T4Dataset is following.
- data/t4dataset/
- pseudo_xx1/
- scene_0/
- 0/
- annotation/
- sample.json
- ..
- scene_1/
- 0/
- ..
- ..
Create a YAML under autoware_ml/configs/t4dataset/ describing your auto-labeled T4Dataset.
Example: autoware_ml/configs/t4dataset/pseudo_j6_v1.yaml
Ensure your auto-labeled T4Dataset is mounted under ./data/t4dataset inside the container.
Example:
docker run -it --gpus '"device=0"' --name auto_labeling_3d --shm-size=64g -d -v {path to autoware-ml}:/workspace -v {path to auto-labeled T4Dataset}:/workspace/data auto_labeling_3d bashAdd the name of your auto-labeled T4Dataset directory to the dataset_version_list in the dataset config file used by your training configuration.
Example Case
- Example Case:
- If your training config is:
Centerpoint/second_secfpn_4xb16_121m_j6gen2_base.py- This config uses:
j6gen2_base.py
- This config uses:
- And your pseudo-label dataset directory is named:
pseudo_x2
- If your training config is:
- To Do:
- Add
pseudo_x2to thedataset_version_listin thej6gen2_base.pyfile.
- Add
dataset_version_list = [
"db_j6gen2_v1",
"db_j6gen2_v2",
"db_j6gen2_v3",
"db_j6gen2_v4",
"db_j6gen2_v5",
"db_largebus_v1",
"db_largebus_v2",
"pseudo_x2",
]Follow the dataset preparation and generate info files and start training using your chosen model and the YAML you added in Step 1.
Auto-labeling works at 10hz while manually annotated dataset are usually 1hz. Depending on your data distribution, you might want to down-sample your specific dataset. This is currently done at the info file creation stage, as seen here. Add your dataset name to the conditional check and set sample_steps = 10.