Open
Description
Imbalanced datasets, where the classes have very different occurrence rates, can show up in large data sets.
There are many strategies for dealing with imbalanced data. http://contrib.scikit-learn.org/imbalanced-learn/stable/api.html implements a set, some of which could be scaled to large datasets with dask.