mbari-aidata is a command line tool to do extract, transform, load and download operations on AI data for a number of projects at MBARI that require detection, clustering or classification workflows.
More documentation and examples are available at https://docs.mbari.org/internal/ai/data.
- 🧠 Object Detection/Clustering Integration: Loads detection/classification/clustering output from SDCAT formatted results.
- Flexible Data Export: Downloads from Tator into machine learning formats like COCO, CIFAR, or PASCAL VOC.
- Real-Time Uploads: Pushes localizations to Tator via Redis queues for real-time workflows.
- Metadata Extraction: Parses images metadata such as GPS/time/date through a plugin-based system (extractors).
- Duplicate Detection & flexible media references: Supports duplicate media load checks with the --check-duplicates flag.
- Images or video are made accessible through a web server without needing to upload or move them from your internal NFS project mounts (e.g. Thalassa)
- Augmentation Support: Augment VOC datasets with Albumentations to boost your object detection model performance. See examples in the docs.
- Python 3.10 or higher
- A Tator API token and (optional) Redis password for the .env file. Contact the MBARI AI team for access.
- 🐳Docker for development and testing only, but it can also be used instead of a local Python installation.
- For local installation, you will need to install the required Python packages listed in the
requirements.txt
file, ffmpeg, and the mp4dump tool from https://www.bento4.com/
Install as a Python package:
pip install mbari-aidata
Create the .env file with the following contents in the root directory of the project:
TATOR_TOKEN=your_api_token
REDIS_PASSWORD=your_redis_password
ENVIRONMENT=testing or production
Create a configuration file in the root directory of the project:
touch config_cfe.yaml
Or, use the project specific configuration from our docs server at https://docs.mbari.org/internal/ai/projects/
This file will be used to configure the project data, such as mounts, plugins, and database connections.
aidata download --version Baseline --labels "Diatoms, Copepods" --config https://docs.mbari.org/internal/ai/projects/uav-901902/config_uav.yml
⚙️Example configuration file:
# config_cfe.yml
# Config file for CFE project production
mounts:
- name: "image"
path: "/mnt/CFElab"
host: "https://mantis.shore.mbari.org"
nginx_root: "/CFElab"
- name: "video"
path: "/mnt/CFElab"
host: "https://mantis.shore.mbari.org"
nginx_root: "/CFElab"
plugins:
- name: "extractor"
module: "mbari_aidata.plugins.extractors.tap_cfe_media"
function: "extract_media"
redis:
host: "doris.shore.mbari.org"
port: 6382
vss:
project: "902111-CFE"
model: "google/vit-base-patch16-224"
tator:
project: "902111-CFE"
host: "https://mantis.shore.mbari.org"
image:
attributes:
iso_datetime: #<-------Required for images
type: datetime
depth:
type: float
video:
attributes:
iso_start_datetime: #<-------Required for videos
type: datetime
box:
attributes:
Label:
type: string
score:
type: float
cluster:
type: string
saliency:
type: float
area:
type: int
exemplar:
type: bool
A docker version is also available at mbari/aidata:latest
or mbari/aidata:latest:cuda-124
.
For example, to download data using the docker image:
docker run -it --rm -v $(pwd):/mnt mbari/aidata:latest aidata download --version Baseline --labels "Diatoms, Copepods" --config config_cfe.yml
aidata download --help
- Download data, such as images, boxes, into various formats for machine learning e.g. COCO, CIFAR, or PASCAL VOC format. Augmentation supported for VOC exported data using Albumentations.aidata load --help
- Load data, such as images, boxes, or clusters into either a Postgres or REDIS databaseaidata db --help
- Commands related to database managementaidata transform --help
- Commands related to transforming downloaded dataaidata -h
- Print help message and exit.
Source code is available at github.com/mbari-org/aidata.
See the Development Guide for more information on how to set up the development environment or the justfile
🗓️ Last updated: 2025-06-13