Skip to content

[Feature Request] Implement early stopping for Task Runner API #1616

@kminhta

Description

@kminhta

API: Task Runner

Early stopping is a valuable technique to prevent overfitting and optimize training efficiency. If the aggregated model validation performance metric stops improving over a predefined number of rounds (let the user specify, keep it off by default), the training process is stopped. This helps ensure that the model does not overfit on the training data (continues to improve on the training data, but stopped improving on the validation data), and also saves computational resources by avoiding unnecessary training.

Suggestions (flexible)

  • The aggregator should monitor aggregated model validation performance during training rounds then stop training if the metric does not improve for a specified number of rounds.
  • This feature can be enabled in the plan.yaml under the aggregator settings
  • Recommended to be implemented as a callback to be performed on round end
  • Could be good reference existing ML frameworks: https://docs.pytorch.org/ignite/generated/ignite.handlers.early_stopping.EarlyStopping.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions