Skip to content

Latest commit

 

History

History

README.md

auto_labeling_3d

The pipeline of auto labeling for 3D detection.

graph LR
    NADATA[(non-annotated T4Dataset)]

    subgraph "Model A inference"
        INFERENCE_A[create_info]
    end

    subgraph "Model B inference"
        INFERENCE_B[create_info]
    end

    subgraph "Model C inference"
        INFERENCE_C[create_info]
    end

    subgraph "Ensemble"
        ENSEMBLE[filter_objects]
    end

    subgraph "Temporal ID Consistency"
        TRACKING[attach_tracking_id]
    end

    subgraph "Convert to T4Dataset"
        CONVERT[create_pseudo_dataset]
    end

    DATA[(auto-labeled T4Dataset)]

    NADATA --> INFERENCE_A
    NADATA --> INFERENCE_B
    NADATA --> INFERENCE_C

    INFERENCE_A --> ENSEMBLE
    INFERENCE_B --> ENSEMBLE
    INFERENCE_C --> ENSEMBLE

    ENSEMBLE --> TRACKING
    TRACKING --> CONVERT
    CONVERT --> DATA

    click INFERENCE_A "https://github.com/tier4/AWML/tree/main/tools/auto_labeling_3d#step-31-create-info-file-from-non-annotated-t4dataset"
    click INFERENCE_B "https://github.com/tier4/AWML/tree/main/tools/auto_labeling_3d#step-31-create-info-file-from-non-annotated-t4dataset"
    click INFERENCE_C "https://github.com/tier4/AWML/tree/main/tools/auto_labeling_3d#step-31-create-info-file-from-non-annotated-t4dataset"
    click ENSEMBLE "https://github.com/tier4/AWML/tree/main/tools/auto_labeling_3d#step-32-filter-and-ensemble-results"
    click TRACKING "https://github.com/tier4/AWML/tree/main/tools/auto_labeling_3d#step-33-attach-tracking-ids"
    click CONVERT "https://github.com/tier4/AWML/tree/main/tools/auto_labeling_3d#step-34-create-the-auto-labeled-t4dataset"
Loading

Auto Labeling 3D Process Flow

1. Setup Environment

  • Please follow the installation tutorial to set up the environment.
  • In addition, please follow the below setting up procedure.

Build docker image

  • Build docker image.
    • If you build AWML image locally, please add --build-arg BASE_IMAGE=awml or --build-arg BASE_IMAGE=awml-ros2 to build script.
DOCKER_BUILDKIT=1 docker build -t auto_labeling_3d -f tools/auto_labeling_3d/Dockerfile .
  • Run docker container.
docker run -it --gpus '"device=0"' --name auto_labeling_3d --shm-size=64g -d -v {path to autoware-ml}:/workspace -v {path to data}:/workspace/data auto_labeling_3d bash
  • If you want to use these models in auto labeling, please follow the setting up procedure in the README of each model:

2. Prepare Dataset

Prepare your non-annotated T4dataset in the following structure:

- data/t4dataset/
  - pseudo_xx1/
    - scene_0/
      - annotation/
        - ..
      - data/
        - LIDAR_CONCAT/
        - CAM_*/
        - ..
      - ...
    - scene_1/
      - ..

3. Run Auto Labeling Pipeline

You have two options to run the pipeline:

Option A: Quick Start with launch.py

For most users, use launch.py to run the entire pipeline in one command:

python tools/auto_labeling_3d/entrypoint/launch.py tools/auto_labeling_3d/entrypoint/configs/example.yaml

This executes all steps automatically:

  1. Download model checkpoints from Model Zoo
  2. Run inference and create info files with pseudo labels
  3. Ensemble/filter results from multiple models
  4. Attach consistent tracking IDs across frames
  5. Generate final auto-labeled T4Dataset
  6. Restructure directory format

See example.yaml and update paths for your workspace.

Option B: Run Individual Modules

For advanced users who need granular control or want to customize the pipeline, you can run each step separately:

Step 3.1: Create info file from non-annotated T4dataset

Step 3.1: Create info file from non-annotated T4dataset

Run inference with a 3D detection model to generate info files:

python tools/auto_labeling_3d/create_info_data/create_info_data.py --root-path {path to directory of non-annotated T4dataset} --out-dir {path to output} --config {model config file to use auto labeling} --ckpt {checkpoint file}
  • For example, run the following command
python tools/auto_labeling_3d/create_info_data/create_info_data.py --root-path ./data/t4dataset/pseudo_xx1 --out-dir ./data/t4dataset/info --config projects/BEVFusion/configs/t4dataset/bevfusion_lidar_voxel_second_secfpn_1xb1_t4offline.py --ckpt ./work_dirs/bevfusion_offline/epoch_20.pth
  • If you want to ensemble for auto labeling, you should create info files for each model.
  • As a result, the data is as below
- data/t4dataset/
  - pseudo_xx1/
    - scene_0/
      - annotation/
        - ..
      - data/
      - ...
    - scene_1/
      - ..
  - info/
    - pseudo_infos_raw_centerpoint.pkl
    - pseudo_infos_raw_bevfusion.pkl
Step 3.2: Filter and ensemble results

Step 3.2: Filter and ensemble results

Filter
  • Set a config to decide what you want to filter
    • Set threshold to filter objects with low confidence
centerpoint_pipeline = [
    dict(
        type="ThresholdFilter",
        confidence_thresholds={
            "car": 0.35,
            "truck": 0.35,
            "bus": 0.35,
            "bicycle": 0.35,
            "pedestrian": 0.35,
        },
        use_label=["car", "truck", "bus", "bicycle", "pedestrian"],
    ),
]

filter_pipelines = dict(
  type="Filter",
  input=dict(
          name="centerpoint",
          info_path="./data/t4dataset/info/pseudo_infos_raw_centerpoint.pkl",
          filter_pipeline=centerpoint_pipeline,
  ),
)
  • Make the info file to filter the objects which do not use for auto-labeled T4Dataset
python tools/auto_labeling_3d/filter_objects/filter_objects.py --config {config_file} --work-dir {path to output}
Ensemble
  • If you want to ensemble model, you set a config as below.
centerpoint_pipeline = [
    dict(
        type="ThresholdFilter",
        confidence_thresholds={
            "car": 0.35,
            "truck": 0.35,
            "bus": 0.35,
            "bicycle": 0.35,
            "pedestrian": 0.35,
        },
        use_label=["car", "truck", "bus", "bicycle", "pedestrian"],
    ),
]

bevfusion_pipeline = [
    dict(
        type="ThresholdFilter",
        confidence_thresholds={
            "bicycle": 0.35,
            "pedestrian": 0.35,
        },
        use_label=["bicycle", "pedestrian"],
    ),
]

filter_pipelines = dict(
    type="Ensemble",
    config=dict(
        type="NMSEnsembleModel",
        ensemble_setting=dict(
            weights=[1.0, 1.0],
            iou_threshold=0.55,
        ),
    ),
    inputs=[
        dict(
            name="centerpoint",
            info_path="./data/t4dataset/info/pseudo_infos_raw_centerpoint.pkl",
            filter_pipeline=centerpoint_pipeline,
        ),
        dict(
            name="bevfusion",
            info_path="./data/t4dataset/info/pseudo_infos_raw_bevfusion.pkl",
            filter_pipeline=bevfusion_pipeline,
        ),
    ],
)
  • Make the info file to filter the objects which do not use for auto-labeled T4Dataset and ensemble filtered results.
python tools/auto_labeling_3d/filter_objects/ensemble_infos.py --config {config_file} --work-dir {path to output}
  • As a result, the data is as below
- data/t4dataset/
  - pseudo_xx1/
    - scene_0/
      - annotation/
        - ..
      - data/
      - ...
    - scene_1/
      - ..
  - info/
    - pseudo_infos_raw_centerpoint.pkl
    - pseudo_infos_raw_bevfusion.pkl
    - pseudo_infos_filtered.pkl
Step 3.3: Attach tracking IDs

Step 3.3: Attach tracking IDs

  • Attach tracking IDs to maintain temporal consistency:
    • If you do not use for target annotation, you can skip this section.
python tools/auto_labeling_3d/attach_tracking_id/attach_tracking_id.py --input {info file} --output {info_file}
  • As a result, an info file is made as below.
- data/t4dataset/
  - pseudo_xx1/
    - scene_0/
      - annotation/
        - ..
      - data/
      - ...
    - scene_1/
      - ..
  - info/
    - pseudo_infos_raw_centerpoint.pkl
    - pseudo_infos_raw_bevfusion.pkl
    - pseudo_infos_filtered.pkl
    - pseudo_infos_tracked.pkl
Step 3.4: Create the auto-labeled T4Dataset

Step 3.4: Create the auto-labeled T4Dataset

Generate the auto-labeled T4Dataset:

python tools/auto_labeling_3d/create_pseudo_t4dataset/create_pseudo_t4dataset.py {yaml config file about T4dataset data} --root-path {path to directory of non-annotated T4dataset} --input {path to pkl file}
  • As a result, auto-labeled T4Dataset is made as below.
- data/t4dataset/
  - pseudo_xx1/
    - scene_0/
      - annotation/
        - sample.json
        - ..
    - scene_1/
      - ..
    - ..

4. Use for training

Verify the auto-labeled T4Dataset

Before using the auto-labeled T4Dataset for training, you can visualize and verify the generated labels using t4-devkit.

Please refer to t4-devkit render tutorial for visualization instructions.

Upload to WebAuto

Please upload auto-labeled T4Dataset to WebAuto to share easily for other users.

Please check Web.Auto document for the detail.

Use in local PC

To align T4dataset directory structure, you run the script as following.

python tools/auto_labeling_3d/change_directory_structure/change_directory_structure.py --dataset_dir data/t4dataset/pseudo_xx1/

The result of the structure of auto-labeled T4Dataset is following.

- data/t4dataset/
  - pseudo_xx1/
    - scene_0/
      - 0/
        - annotation/
          - sample.json
          - ..
    - scene_1/
      - 0/
        - ..
    - ..

How to train with the auto-labeled T4Dataset

1. Add a YAML config for your auto-labeled T4Dataset

Create a YAML under autoware_ml/configs/t4dataset/ describing your auto-labeled T4Dataset.

Example: autoware_ml/configs/t4dataset/pseudo_j6_v1.yaml

2. Run docker and mount the auto-labeled T4Dataset

Ensure your auto-labeled T4Dataset is mounted under ./data/t4dataset inside the container.

Example:

docker run -it --gpus '"device=0"' --name auto_labeling_3d --shm-size=64g -d -v {path to autoware-ml}:/workspace -v {path to auto-labeled T4Dataset}:/workspace/data auto_labeling_3d bash

3. Update dataset.py used in training config

Add the name of your auto-labeled T4Dataset directory to the dataset_version_list in the dataset config file used by your training configuration.

Example Case
dataset_version_list = [
    "db_j6gen2_v1",
    "db_j6gen2_v2",
    "db_j6gen2_v3",
    "db_j6gen2_v4",
    "db_j6gen2_v5",
    "db_largebus_v1",
    "db_largebus_v2",
    "pseudo_x2",
]

4. Prepare T4Dataset info and train

Follow the dataset preparation and generate info files and start training using your chosen model and the YAML you added in Step 1.

(Optional) Add downsampling for your dataset

Auto-labeling works at 10hz while manually annotated dataset are usually 1hz. Depending on your data distribution, you might want to down-sample your specific dataset. This is currently done at the info file creation stage, as seen here. Add your dataset name to the conditional check and set sample_steps = 10.