IFCB Flow Metric is an anomaly detection toolkit for Imaging FlowCytobot (IFCB) data. It extracts statistical features from the ROI (region of interest) point clouds in each IFCB bin and trains an Isolation Forest to identify distributions that deviate from normal patterns. Scores can be visualized through a web dashboard for interactive exploration.
- Parallel feature extraction from IFCB ADC files
- Isolation Forest training for unsupervised anomaly detection
- CSV based scoring of new data sets
- Dash powered dashboard to explore anomaly scores and individual point clouds
- Dockerfile for deployment with Gunicorn
- Clone this repository
git clone https://github.com/WHOIGit/ifcb-flow-metric.git cd ifcb-flow-metric
- Install Python dependencies (Python >=3.11 recommended)
pip install -r requirements.txt
Use train.py
to train an Isolation Forest on a directory of IFCB bins.
python train.py <data_dir> [options]
Options:
--id-file
– path to a file with one PID per line. If omitted, all bins indata_dir
are used.--n-jobs
– number of parallel workers for feature extraction (default fromutils/constants.py
).--contamination
– expected fraction of anomalies.--aspect-ratio
– camera frame aspect ratio.--chunk-size
– number of PIDs per extraction chunk.--model
– output path for the trained model (defaultclassifier.pkl
).--config
– YAML string specifying which features to use for training.--config-file
– YAML file path specifying which features to use for training.
A typical command might look like:
python train.py /path/to/data --n-jobs 4 --contamination 0.00001
By default, all 26 available features are used for training. You can customize which features to include using either:
-
YAML configuration file:
python train.py /path/to/data --config-file feature_config.yaml
-
YAML string directly:
python train.py /path/to/data --config 'spatial_stats: {mean_x: true, mean_y: true}'
The repository includes feature_config.yaml
as an example configuration file with all features enabled. Features are organized into categories:
- Spatial Statistics (8 features): mean, std, median, IQR for x/y coordinates
- Distribution Shape (2 features): ratio_spread, core_fraction
- Clipping Detection (2 features): duplicate_fraction, max_duplicate_fraction
- Histogram Uniformity (2 features): cv_x, cv_y
- Statistical Moments (4 features): skew_x, skew_y, kurt_x, kurt_y
- PCA Orientation (2 features): angle, eigen_ratio
- Edge Features (5 features): left/right/top/bottom/total edge fractions
- Temporal (1 feature): t_y_var
The trained model is stored as a pickle file for later inference.
To compute anomaly scores for a set of bins using a trained model:
python score.py <data_dir> [options]
Important options:
--id-file
– list of PIDs to score.--n-jobs
– workers for feature extraction.--aspect-ratio
– camera aspect ratio.--chunk-size
– PIDs per extraction chunk.--model
– path to the saved model.--output
– CSV file to write results (defaultscores.csv
).
Each row in the CSV contains a PID and its anomaly score.
dashboard.py
provides a Dash application for exploring scores. It reads the CSV produced by score.py
and fetches point cloud data from the IFCB dashboard API.
python dashboard.py
The dashboard URL defaults to http://localhost:8000
but can be changed via the DASHBOARD_BASE_URL
environment variable. Additional environment variables include FILE_PATH
(path to the scores CSV), MONTH
(filter data by month in YYYYMM
format), and DECIMATE
(plotting decimation factor).
The repository includes a Dockerfile
for running the dashboard under Gunicorn:
docker build -t ifcb-flow-metric .
docker run -p 8050:8050 -v /path/to/scores.csv:/app/scores.csv ifcb-flow-metric
This exposes the dashboard on port 8050.
Path | Description |
---|---|
models/ |
Feature extraction, training, and inference utilities |
utils/ |
Helper functions and constants |
train.py |
Command line training script |
score.py |
Command line scoring script |
dashboard.py |
Dash dashboard for interactive exploration |
Default configuration values such as contamination rate and output paths are defined in utils/constants.py
.
This project is licensed under the MIT License. See the LICENSE file for details.
Some of this code and most of this README were generated by AI.