Skip to content

Latest commit

 

History

History
334 lines (234 loc) · 25.6 KB

File metadata and controls

334 lines (234 loc) · 25.6 KB

Welcome to PyOD V2 documentation!

Deployment & Documentation & Stats & License

PyPI version Anaconda version Documentation status GitHub stars GitHub forks Downloads Testing Coverage Status Maintainability License Benchmark

Read Me First

Welcome to PyOD, a comprehensive but easy-to-use Python library for detecting anomalies in multivariate data. Whether you are working with a small-scale project or large datasets, PyOD provides a range of algorithms to suit your needs.

PyOD Version 2 is now available (Paper) :cite:`a-chen2024pyod`, featuring:

  • Expanded Deep Learning Support: Integrates 12 modern neural models into a single PyTorch-based framework, bringing the total number of outlier detection methods to 45.
  • Enhanced Performance and Ease of Use: Models are optimized for efficiency and consistent performance across different datasets.
  • LLM-based Model Selection: Automated model selection guided by a large language model reduces manual tuning and assists users who may have limited experience with outlier detection.
  • Multi-Modal Detection via EmbeddingOD: Chain foundation model encoders (sentence-transformers, OpenAI, HuggingFace) with any PyOD detector for text and image anomaly detection :cite:`a-li2024nlp`.

PyOD Ecosystem & Resources: NLP-ADBench (NLP anomaly detection) :cite:`a-li2024nlp` | TODS (time-series) | PyGOD (graph) | ADBench (benchmark) | AD-LLM (LLM-based AD) :cite:`a-yang2024ad` | Resources


About PyOD

PyOD, established in 2017, has become a go-to Python library for detecting anomalous/outlying objects in multivariate data. This exciting yet challenging field is commonly referred to as Outlier Detection or Anomaly Detection.

PyOD includes more than 50 detection algorithms, from classical LOF (SIGMOD 2000) to the cutting-edge ECOD and DIF (TKDE 2022 and 2023). Since 2017, PyOD has been successfully used in numerous academic research projects and commercial products with more than 26 million downloads. It is also well acknowledged by the machine learning community with various dedicated posts/tutorials, including Analytics Vidhya, KDnuggets, and Towards Data Science.

PyOD is featured for:

  • Unified, User-Friendly Interface across various algorithms.
  • Wide Range of Models, from classic techniques to the latest deep learning methods in PyTorch.
  • High Performance & Efficiency, leveraging numba and joblib for JIT compilation and parallel processing.
  • Fast Training & Prediction, achieved through the SUOD framework :cite:`a-zhao2021suod`.

Outlier Detection with 5 Lines of Code:

# Example: Training an ECOD detector
from pyod.models.ecod import ECOD
clf = ECOD()
clf.fit(X_train)
y_train_scores = clf.decision_scores_  # Outlier scores for training data
y_test_scores = clf.decision_function(X_test)  # Outlier scores for test data

Text Anomaly Detection with EmbeddingOD (pip install pyod sentence-transformers):

from pyod.models.embedding import EmbeddingOD
clf = EmbeddingOD(encoder='all-MiniLM-L6-v2', detector='KNN')
clf.fit(train_texts)                          # list of strings
scores = clf.decision_function(test_texts)    # anomaly scores
labels = clf.predict(test_texts)              # binary labels

# Or use a preset:
clf = EmbeddingOD.for_text(quality='fast')    # MiniLM + KNN

Image detection requires additional packages (pip install transformers torch). See EmbeddingOD example for details.

Selecting the Right Algorithm: Start with ECOD or Isolation Forest for tabular data, EmbeddingOD for text/image, or MetaOD for data-driven selection.

Citing PyOD:

If you use PyOD in a scientific publication, we would appreciate citations to the following paper(s):

PyOD 2: A Python Library for Outlier Detection with LLM-powered Model Selection is available as a preprint. If you use PyOD in a scientific publication, we would appreciate citations to the following paper:

@inproceedings{chen2025pyod,
  title={Pyod 2: A python library for outlier detection with llm-powered model selection},
  author={Chen, Sihan and Qian, Zhuangzhuang and Siu, Wingchun and Hu, Xingcan and Li, Jiaqi and Li, Shawn and Qin, Yuehan and Yang, Tiankai and Xiao, Zhuo and Ye, Wanghao and others},
  booktitle={Companion Proceedings of the ACM on Web Conference 2025},
  pages={2807--2810},
  year={2025}
}

PyOD paper is published in Journal of Machine Learning Research (JMLR) (MLOSS track).:

@article{zhao2019pyod,
    author  = {Zhao, Yue and Nasrullah, Zain and Li, Zheng},
    title   = {PyOD: A Python Toolbox for Scalable Outlier Detection},
    journal = {Journal of Machine Learning Research},
    year    = {2019},
    volume  = {20},
    number  = {96},
    pages   = {1-7},
    url     = {http://jmlr.org/papers/v20/19-011.html}
}

or:

Zhao, Y., Nasrullah, Z. and Li, Z., 2019. PyOD: A Python Toolbox for Scalable Outlier Detection. Journal of machine learning research (JMLR), 20(96), pp.1-7.

For a broader perspective on anomaly detection, see our NeurIPS papers on ADBench :cite:`a-han2022adbench` and ADGym :cite:`a-jiang2023adgym`.


ADBench Benchmark and Datasets

We just released a 45-page, the most comprehensive ADBench: Anomaly Detection Benchmark :cite:`a-han2022adbench`. The fully open-sourced ADBench compares 30 anomaly detection algorithms on 57 benchmark datasets.

The organization of ADBench is provided below:

benchmark-fig

For a simpler visualization, we make the comparison of selected models via compare_all_models.py.

Comparison_of_All

Implemented Algorithms

PyOD toolkit consists of three major functional groups:

(i) Individual Detection Algorithms :

Type Abbr Algorithm Year Class Ref
Probabilistic ECOD Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions 2022 :class:`pyod.models.ecod.ECOD` :cite:`a-li2021ecod`
Probabilistic COPOD COPOD: Copula-Based Outlier Detection 2020 :class:`pyod.models.copod.COPOD` :cite:`a-li2020copod`
Probabilistic ABOD Angle-Based Outlier Detection 2008 :class:`pyod.models.abod.ABOD` :cite:`a-kriegel2008angle`
Probabilistic FastABOD Fast Angle-Based Outlier Detection using approximation 2008 :class:`pyod.models.abod.ABOD` :cite:`a-kriegel2008angle`
Probabilistic MAD Median Absolute Deviation (MAD) 1993 :class:`pyod.models.mad.MAD` :cite:`a-iglewicz1993detect`
Probabilistic SOS Stochastic Outlier Selection 2012 :class:`pyod.models.sos.SOS` :cite:`a-janssens2012stochastic`
Probabilistic QMCD Quasi-Monte Carlo Discrepancy outlier detection 2001 :class:`pyod.models.qmcd.QMCD` :cite:`a-fang2001wrap`
Probabilistic KDE Outlier Detection with Kernel Density Functions 2007 :class:`pyod.models.kde.KDE` :cite:`a-latecki2007outlier`
Probabilistic Sampling Rapid distance-based outlier detection via sampling 2013 :class:`pyod.models.sampling.Sampling` :cite:`a-sugiyama2013rapid`
Probabilistic GMM Probabilistic Mixture Modeling for Outlier Analysis   :class:`pyod.models.gmm.GMM` :cite:`a-aggarwal2015outlier` [Ch.2]
Linear Model PCA Principal Component Analysis (the sum of weighted projected distances to the eigenvector hyperplanes) 2003 :class:`pyod.models.pca.PCA` :cite:`a-shyu2003novel`
Linear Model KPCA Kernel Principal Component Analysis 2007 :class:`pyod.models.kpca.KPCA` :cite:`a-hoffmann2007kernel`
Linear Model MCD Minimum Covariance Determinant (use the mahalanobis distances as the outlier scores) 1999 :class:`pyod.models.mcd.MCD` :cite:`a-rousseeuw1999fast,a-hardin2004outlier`
Linear Model CD Use Cook's distance for outlier detection 1977 :class:`pyod.models.cd.CD` :cite:`a-cook1977detection`
Linear Model OCSVM One-Class Support Vector Machines 2001 :class:`pyod.models.ocsvm.OCSVM` :cite:`a-scholkopf2001estimating`
Linear Model LMDD Deviation-based Outlier Detection (LMDD) 1996 :class:`pyod.models.lmdd.LMDD` :cite:`a-arning1996linear`
Proximity-Based LOF Local Outlier Factor 2000 :class:`pyod.models.lof.LOF` :cite:`a-breunig2000lof`
Proximity-Based COF Connectivity-Based Outlier Factor 2002 :class:`pyod.models.cof.COF` :cite:`a-tang2002enhancing`
Proximity-Based Incr. COF Memory Efficient Connectivity-Based Outlier Factor (slower but reduce storage complexity) 2002 :class:`pyod.models.cof.COF` :cite:`a-tang2002enhancing`
Proximity-Based CBLOF Clustering-Based Local Outlier Factor 2003 :class:`pyod.models.cblof.CBLOF` :cite:`a-he2003discovering`
Proximity-Based LOCI LOCI: Fast outlier detection using the local correlation integral 2003 :class:`pyod.models.loci.LOCI` :cite:`a-papadimitriou2003loci`
Proximity-Based HBOS Histogram-based Outlier Score 2012 :class:`pyod.models.hbos.HBOS` :cite:`a-goldstein2012histogram`
Proximity-Based HDBSCAN Density-based clustering based on hierarchical density estimates 2013 :class:`pyod.models.hdbscan.HDBSCAN` :cite:`a-campello2013density`
Proximity-Based kNN k Nearest Neighbors (use the distance to the kth nearest neighbor as the outlier score 2000 :class:`pyod.models.knn.KNN` :cite:`a-ramaswamy2000efficient,a-angiulli2002fast`
Proximity-Based AvgKNN Average kNN (use the average distance to k nearest neighbors as the outlier score) 2002 :class:`pyod.models.knn.KNN` :cite:`a-ramaswamy2000efficient,a-angiulli2002fast`
Proximity-Based MedKNN Median kNN (use the median distance to k nearest neighbors as the outlier score) 2002 :class:`pyod.models.knn.KNN` :cite:`a-ramaswamy2000efficient,a-angiulli2002fast`
Proximity-Based SOD Subspace Outlier Detection 2009 :class:`pyod.models.sod.SOD` :cite:`a-kriegel2009outlier`
Proximity-Based ROD Rotation-based Outlier Detection 2020 :class:`pyod.models.rod.ROD` :cite:`a-almardeny2020novel`
Outlier Ensembles IForest Isolation Forest 2008 :class:`pyod.models.iforest.IForest` :cite:`a-liu2008isolation,a-liu2012isolation`
Outlier Ensembles INNE Isolation-based Anomaly Detection Using Nearest-Neighbor Ensembles 2018 :class:`pyod.models.inne.INNE` :cite:`a-bandaragoda2018isolation`
Outlier Ensembles DIF Deep Isolation Forest for Anomaly Detection 2023 :class:`pyod.models.dif.DIF` :cite:`a-xu2023dif`
Outlier Ensembles FB Feature Bagging 2005 :class:`pyod.models.feature_bagging.FeatureBagging` :cite:`a-lazarevic2005feature`
Outlier Ensembles LSCP LSCP: Locally Selective Combination of Parallel Outlier Ensembles 2019 :class:`pyod.models.lscp.LSCP` :cite:`a-zhao2019lscp`
Outlier Ensembles XGBOD Extreme Boosting Based Outlier Detection (Supervised) 2018 :class:`pyod.models.xgbod.XGBOD` :cite:`a-zhao2018xgbod`
Outlier Ensembles LODA Lightweight On-line Detector of Anomalies 2016 :class:`pyod.models.loda.LODA` :cite:`a-pevny2016loda`
Outlier Ensembles SUOD SUOD: Accelerating Large-scale Unsupervised Heterogeneous Outlier Detection (Acceleration) 2021 :class:`pyod.models.suod.SUOD` :cite:`a-zhao2021suod`
Neural Networks AutoEncoder Fully connected AutoEncoder (use reconstruction error as the outlier score) 2015 :class:`pyod.models.auto_encoder.AutoEncoder` :cite:`a-aggarwal2015outlier` [Ch.3]
Neural Networks VAE Variational AutoEncoder (use reconstruction error as the outlier score) 2013 :class:`pyod.models.vae.VAE` :cite:`a-kingma2013auto`
Neural Networks Beta-VAE Variational AutoEncoder (all customized loss term by varying gamma and capacity) 2018 :class:`pyod.models.vae.VAE` :cite:`a-burgess2018understanding`
Neural Networks SO_GAAL Single-Objective Generative Adversarial Active Learning 2019 :class:`pyod.models.so_gaal.SO_GAAL` :cite:`a-liu2019generative`
Neural Networks MO_GAAL Multiple-Objective Generative Adversarial Active Learning 2019 :class:`pyod.models.mo_gaal.MO_GAAL` :cite:`a-liu2019generative`
Neural Networks DeepSVDD Deep One-Class Classification 2018 :class:`pyod.models.deep_svdd.DeepSVDD` :cite:`a-ruff2018deepsvdd`
Neural Networks AnoGAN Anomaly Detection with Generative Adversarial Networks 2017 :class:`pyod.models.anogan.AnoGAN` :cite:`a-schlegl2017unsupervised`
Neural Networks ALAD Adversarially learned anomaly detection 2018 :class:`pyod.models.alad.ALAD` :cite:`a-zenati2018adversarially`
Neural Networks DevNet Deep Anomaly Detection with Deviation Networks 2019 :class:`pyod.models.devnet.DevNet` :cite:`a-pang2019deep`
Neural Networks AE1SVM Autoencoder-based One-class Support Vector Machine 2019 :class:`pyod.models.ae1svm.AE1SVM` :cite:`a-nguyen2019scalable`
Graph-based R-Graph Outlier detection by R-graph 2017 :class:`pyod.models.rgraph.RGraph` :cite:`a-you2017provable`
Graph-based LUNAR LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks 2022 :class:`pyod.models.lunar.LUNAR` :cite:`a-goodge2022lunar`
Embedding-based EmbeddingOD Multi-modal anomaly detection via foundation model embeddings (text, image) 2025 :class:`pyod.models.embedding.EmbeddingOD` :cite:`a-li2024nlp`

Ensemble methods (IForest, INNE, DIF, FB, LSCP, LODA, SUOD, XGBOD) are included in the table above. Score combination functions (average, maximization, AOM, MOA, median, majority vote) are in :mod:`pyod.models.combination`.

(ii) Utility Functions:

Type Name Function
Data :func:`~pyod.utils.data.generate_data` Synthesized data generation; normal data from multivariate Gaussian, outliers from uniform distribution
Data :func:`~pyod.utils.data.generate_data_clusters` Synthesized data generation in clusters for more complex patterns
Evaluation :func:`~pyod.utils.data.evaluate_print` Print ROC-AUC and Precision @ Rank n for a detector
Evaluation :func:`~pyod.utils.utility.precision_n_scores` Calculate Precision @ Rank n
Utility :func:`~pyod.utils.utility.get_label_n` Turn raw outlier scores into binary labels by assigning 1 to the top n scores
Stat :func:`~pyod.utils.stat_models.wpearsonr` Calculate the weighted Pearson correlation of two samples
Encoding :func:`~pyod.utils.encoders.resolve_encoder` Resolve an encoder from a string, BaseEncoder instance, or callable
Encoding SentenceTransformerEncoder Encode text via sentence-transformers models (see :doc:`pyod.utils <pyod.utils>`)
Encoding OpenAIEncoder Encode text via OpenAI Embeddings API (see :doc:`pyod.utils <pyod.utils>`)
Encoding HuggingFaceEncoder Encode text or images via HuggingFace transformers (see :doc:`pyod.utils <pyod.utils>`)

API Cheatsheet & Reference

The following APIs are applicable for all detector models for easy use.

Key Attributes of a fitted model:


.. toctree::
   :maxdepth: 2
   :hidden:
   :caption: Getting Started


   install
   model_persistence
   fast_train
   example
   benchmark


.. toctree::
   :maxdepth: 2
   :hidden:
   :caption: Documentation

   api_cc
   pyod


.. toctree::
   :maxdepth: 2
   :hidden:
   :caption: Additional Information

   issues
   relevant_knowledge
   pubs
   faq
   about



References

.. bibliography::
   :cited:
   :labelprefix: A
   :keyprefix: a-