[Feature Request] Implement early stopping for Task Runner API

API: Task Runner

Early stopping is a valuable technique to prevent overfitting and optimize training efficiency. If the aggregated model validation performance metric stops improving over a predefined number of rounds (let the user specify, keep it off by default), the training process is stopped. This helps ensure that the model does not overfit on the training data (continues to improve on the training data, but stopped improving on the validation data), and also saves computational resources by avoiding unnecessary training.

Suggestions (flexible)
- The aggregator should monitor aggregated model validation performance during training rounds then stop training if the metric does not improve for a specified number of rounds.
- This feature can be enabled in the plan.yaml under the aggregator settings
- Recommended to be implemented as a callback to be performed on round end
- Could be good reference existing ML frameworks: https://docs.pytorch.org/ignite/generated/ignite.handlers.early_stopping.EarlyStopping.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Implement early stopping for Task Runner API #1616

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Implement early stopping for Task Runner API #1616

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions