Add `best_k_metrics` parameter to the `ModelCheckpoint` #20457

gonzachiar · 2024-11-27T22:50:13Z

What does this PR do?

Adds a parameter to save all the metrics from the best model.

Fixes #20321

Before submitting

Was this discussed/agreed via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

Reviewer checklist

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

📚 Documentation preview 📚: https://pytorch-lightning--20457.org.readthedocs.build/en/20457/

lantiga

Thanks for the contribution! Happy to merge after we ensure full backward compatibility and test coverage

src/lightning/pytorch/callbacks/model_checkpoint.py

lantiga · 2024-12-05T21:01:35Z

src/lightning/pytorch/callbacks/model_checkpoint.py

@@ -241,9 +249,10 @@ def __init__(
        self._last_global_step_saved = 0  # no need to save when no steps were taken
        self._last_time_checked: Optional[float] = None
        self.current_score: Optional[Tensor] = None
-        self.best_k_models: Dict[str, Tensor] = {}
+        self.best_k_models: Dict[str, Dict[str, Tensor | Dict[str, Tensor]]] = {}


this may be easier to read
Dict[str, Dict[str, Tensor]] | Dict[str, Dict[str, Dict[str, Tensor]]]
but ultimately we'd be better off defining a type alias

more importantly, we need to avoid breaking backward compatibility here
so whatever code relies on best_k_models being Dict[str, Tensor] today needs to keep working

I suggest we just limit ourselves to track best_model_metrics and not mess with best_k_models, or use a separate private attribute

lantiga · 2024-12-05T21:01:59Z

src/lightning/pytorch/callbacks/model_checkpoint.py

@@ -523,7 +534,9 @@ def check_monitor_top_k(self, trainer: "pl.Trainer", current: Optional[Tensor] =
            return True

        monitor_op = {"min": torch.lt, "max": torch.gt}[self.mode]
-        should_update_best_and_save = monitor_op(current, self.best_k_models[self.kth_best_model_path])
+        should_update_best_and_save = monitor_op(
+            current, cast(Tensor, self.best_k_models[self.kth_best_model_path]["score"])


this will stay as in the original if we avoid changing best_k_models

lantiga · 2024-12-05T21:02:34Z

tests/tests_pytorch/checkpointing/test_model_checkpoint.py

@@ -706,6 +706,7 @@ def test_model_checkpoint_save_last_none_monitor(tmp_path, caplog):
    assert checkpoint_callback.best_model_path == str(tmp_path / "epoch=1-step=20.ckpt")
    assert checkpoint_callback.last_model_path == str(tmp_path / "last.ckpt")
    assert checkpoint_callback.best_model_score is None
+    assert checkpoint_callback.best_model_metrics is None


we need to add tests that exercise the new code

lantiga · 2024-12-10T22:40:46Z

@gonzachiar I'm wrapping up the last few PRs for the release, do you have time to push this through in the next couple of days?

for more information, see https://pre-commit.ci

gonzachiar · 2024-12-13T14:34:07Z

@gonzachiar I'm wrapping up the last few PRs for the release, do you have time to push this through in the next couple of days?

Hey, I will work on this during the weekend. Hopes that helps!

stale · 2025-04-16T06:23:27Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://lightning.ai/docs/pytorch/latest/generated/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Discord. Thank you for your contributions.

for more information, see https://pre-commit.ci

add best_k_metrics parameter

8df37c2

gonzachiar requested review from lantiga, Borda, tchaton, justusschock and ethanwharris as code owners November 27, 2024 22:50

github-actions bot added the pl Generic label for PyTorch Lightning package label Nov 27, 2024

mergify bot added the has conflicts label Nov 27, 2024

lantiga reviewed Dec 5, 2024

View reviewed changes

lantiga added the waiting on author Waiting on user action, correction, or update label Dec 5, 2024

Merge branch 'master' into feature/best-k-metrics

32db7d4

mergify bot removed the has conflicts label Dec 10, 2024

[pre-commit.ci] auto fixes from pre-commit.com hooks

0b3322d

for more information, see https://pre-commit.ci

stale bot added the won't fix This will not be worked on label Apr 16, 2025

Merge branch 'master' into feature/best-k-metrics

74cf6a7

stale bot removed the won't fix This will not be worked on label Apr 16, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

a214154

for more information, see https://pre-commit.ci

Borda changed the title ~~Add best_k_metrics parameter to the ModelCheckpoint~~ Add best_k_metrics parameter to the ModelCheckpoint Apr 16, 2025

Borda and others added 3 commits April 16, 2025 10:55

Merge branch 'master' into feature/best-k-metrics

835538c

fix: modify and sort imports

80c4bb6

fix: revert changes on best_k_models

821bfef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `best_k_metrics` parameter to the `ModelCheckpoint` #20457

Add `best_k_metrics` parameter to the `ModelCheckpoint` #20457

gonzachiar commented Nov 27, 2024 •

edited by github-actions bot

Loading

lantiga left a comment

lantiga Dec 5, 2024

lantiga Dec 5, 2024

lantiga Dec 5, 2024

lantiga commented Dec 10, 2024

gonzachiar commented Dec 13, 2024

stale bot commented Apr 16, 2025

Add best_k_metrics parameter to the ModelCheckpoint #20457

Are you sure you want to change the base?

Add best_k_metrics parameter to the ModelCheckpoint #20457

Conversation

gonzachiar commented Nov 27, 2024 • edited by github-actions bot Loading

What does this PR do?

PR review

lantiga left a comment

Choose a reason for hiding this comment

lantiga Dec 5, 2024

Choose a reason for hiding this comment

lantiga Dec 5, 2024

Choose a reason for hiding this comment

lantiga Dec 5, 2024

Choose a reason for hiding this comment

lantiga commented Dec 10, 2024

gonzachiar commented Dec 13, 2024

stale bot commented Apr 16, 2025

Add `best_k_metrics` parameter to the `ModelCheckpoint` #20457

Add `best_k_metrics` parameter to the `ModelCheckpoint` #20457

gonzachiar commented Nov 27, 2024 •

edited by github-actions bot

Loading