-
-
Notifications
You must be signed in to change notification settings - Fork 605
Open
Description
Versions
river version: 0.23.0
Python version: 3.12.8
Operating system: Ubuntu 24.04.3 LTS
Describe the bug
The original DBSTREAM paper defines noisy micro-clusters as follows:

However, the current River implementation of DBSTREAM insufficiently handles of noisy micro-clusters. If noisy micro-clusters are not subject to cleanup, they are straight up included in the list of clusters, which is not in line with the behaviour outlined by the original paper.
I propose the following fix:
Add
if self._micro_clusters[index].weight < self.minimum_weight:
continueafter line 332 in the DBSTREAM implementation. This ensures that noisy micro-clusters are not labelled as clusters.
Steps/code to reproduce
The following code outputs 3 clusters, even though their respective weights are clearly below the minimum_weight threshold, qualifying them as noisy micro-clusters.
from river import cluster
from river import stream
X = [
[0, 0], [50, 50], [100, 100]
]
dbstream = cluster.DBSTREAM(
clustering_threshold=1.0,
fading_factor=0.001,
cleanup_interval=10,
intersection_factor=0.3,
minimum_weight=5
)
for x, _ in stream.iter_array(X):
dbstream.learn_one(x)
for _, c in dbstream.clusters.items():
print(f"center: {c.center}, weight: {c.weight}")Output:
center: {0: 0, 1: 0}, weight: 1
center: {0: 50, 1: 50}, weight: 1
center: {0: 100, 1: 100}, weight: 1
Metadata
Metadata
Assignees
Labels
No labels