Samplers / pipelines for imbalanced datasets

Imbalanced datasets, where the classes have very different occurrence rates, can show up in large data sets.

There are many strategies for dealing with imbalanced data. http://contrib.scikit-learn.org/imbalanced-learn/stable/api.html implements a set, some of which could be scaled to large datasets with dask.