Description
I run a lot of my ML workloads in short lived containers in a dedicated ML cluster. The typical workflow is like this:
- Prepare experiment locally, run a single, smaller epoch for testing
git push && dvc push
to repository- Start container,
git pull && dvc pull
in the container - Run either
dvc repro
ordvc exp run
. git push && dvc push
in the container
More often than desirable, I forget step 5 here or I just run thegit push
part of it. As a result, I end up being left with a corrupted cache and I can't access the experiment's results usingdvc metrics
and similar.
I am aware that there are git-hooks that I can set up using dvc install
. However, given that the containers are typically rather short lived, I tend to not install those either and there also is no guarantee that collaborators will remember to install the hooks. I would therefore appreciate a repository level setting in .dvc/config
. I know that there is such a setting for experiments (exp.auto_push
) but it doesn't seem to apply for cases where I run dvc repro
.
Also, in a perfect world, this feature would be configurable on a per-host basis so that I can specify patterns on which autostage/auto_push are active like ml-container-.*
).