Skip to content

auto push on a repository basis #10681

Open
@igordertigor

Description

@igordertigor

I run a lot of my ML workloads in short lived containers in a dedicated ML cluster. The typical workflow is like this:

  1. Prepare experiment locally, run a single, smaller epoch for testing
  2. git push && dvc push to repository
  3. Start container, git pull && dvc pull in the container
  4. Run either dvc repro or dvc exp run.
  5. git push && dvc push in the container
    More often than desirable, I forget step 5 here or I just run the git push part of it. As a result, I end up being left with a corrupted cache and I can't access the experiment's results using dvc metrics and similar.

I am aware that there are git-hooks that I can set up using dvc install. However, given that the containers are typically rather short lived, I tend to not install those either and there also is no guarantee that collaborators will remember to install the hooks. I would therefore appreciate a repository level setting in .dvc/config. I know that there is such a setting for experiments (exp.auto_push) but it doesn't seem to apply for cases where I run dvc repro.

Also, in a perfect world, this feature would be configurable on a per-host basis so that I can specify patterns on which autostage/auto_push are active like ml-container-.*).

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: data-syncRelated to dvc get/fetch/import/pull/pushfeatureis a featuretriageNeeds to be triaged

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions