Skip to content

Samplers / pipelines for imbalanced datasets #317

Open
@TomAugspurger

Description

@TomAugspurger

Imbalanced datasets, where the classes have very different occurrence rates, can show up in large data sets.

There are many strategies for dealing with imbalanced data. http://contrib.scikit-learn.org/imbalanced-learn/stable/api.html implements a set, some of which could be scaled to large datasets with dask.

Metadata

Metadata

Assignees

No one assigned

    Labels

    RoadmapLarger, self-contained pieces of work.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions