An efficient, scikit-learn compatible implementation of the "DENMUNE: Density peak based clustering using mutual nearest neighbors" algorithm.
Disclaimer: This project is a clean-room rewrite of the DenMune algorithm. It is not derived from the original authors' code and makes significant implementation choices to improve performance and scikit-learn compatibility. It is not a strictly faithful reproduction of the paper's original pseudocode but adheres to its core principles.
The original DenMune paper presents a powerful algorithm for finding clusters of arbitrary shapes and densities. However, existing third-party implementations suffer from critical issues, including a lack of scikit-learn compatibility, data leakage bugs, and severe performance bottlenecks.
This project was created to provide a version of DenMune that is:
- Scikit-Learn Native: Fully compatible with the scikit-learn ecosystem (
Pipeline,GridSearchCV, etc.). - Correct & Robust: Fixes fundamental algorithmic flaws like data leakage and provides robust error handling.
- Performant: Replaces slow Python loops with optimized, vectorized NumPy operations and a sparse graph representation.
- Finds Complex Clusters: Excels at identifying non-convex clusters of varying densities.
- Automatic Noise Detection: Automatically identifies and labels noise points.
- Single Parameter Tuning: Requires only one primary parameter,
k_nearest, for straightforward tuning. - Efficient Implementation: Uses a sparse
csr_matrixfor k-NN graph representation and a Union-Find data structure for fast cluster merging. - Flexible Dimensionality Reduction: Integrates with scikit-learn's
TSNEandPCA, and allows for any user-provided reducer (e.g., UMAP).
pip install denmune-sklfrom sklearn.datasets import make_moons
from sklearn.preprocessing import StandardScaler
from denmune_skl import DenMune
import matplotlib.pyplot as plt
import numpy as np
# 1. Generate and prepare data
X, y = make_moons(n_samples=250, noise=0.07, random_state=42)
X = StandardScaler().fit_transform(X)
# 2. Fit the model
model = DenMune(k_nearest=20, random_state=42)
model.fit(X)
# 3. Visualize the results
labels = model.labels_
n_clusters = model.n_clusters_
print(f"Estimated number of clusters: {n_clusters}")
plt.scatter(X[:, 0], X[:, 1], c=labels, s=50, cmap='viridis')
plt.title(f"DenMune Clustering (k={model.k_nearest})\nEstimated clusters: {n_clusters}")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()This algorithm is based on the following paper:
@article{Abbas2021DenMune,
title = {{DENMUNE}: {Density} peak based clustering using mutual nearest neighbors},
author = {Abbas, Mohamed and El-Zoghabi, Adel and Shoukry, Amin},
journal = {Pattern Recognition},
volume = {112},
pages = {107718},
year = {2021},
doi = {10.1016/j.patcog.2020.107718}
}
This project is licensed under the BSD 3-Clause License.