Description
After running and producing output for ~20s, the "blog" GH action seems to hang and then times out at 6 hours (the hard limit for lifetimes of GH action jobs). Its not a showstopper yet, but it probably will be as soon as we need to publish another blog article.
The step in the workflow that hangs is:
www-main/.github/workflows/blog.yaml
Lines 20 to 25 in de1cd31
For a historical perspective, changes to this action have been:
- (~2y ago) use a micromamba reusable workflow (instead of miniconda) to speed up blog build time
- Blogbuildtime #805
- these runs took between 10 and 25 minutes
- (~1/2 yr ago) use a different micromamba reusable workflow because the old one was deprecated
- Migrate micromamba setup action #1024
- this had runs on the order of ~2h
- subsequent runs (within a day or two) happened much faster because of caching.
- (~1.5mo ago) bump version of micromamba reusable workflow (prompted by dependabot)
- chore(deps): bump mamba-org/setup-micromamba from 1 to 2 #1034
- this has runs that time out at 6h.
My hunch is the culprit is probably some dependency cycle or incompatibility in www-main/dev/environment.yml, perhaps related to a dependency that pulls in CUDA/nvidia-smi
libraries that [dont?] exist on the runner. Its also possible that its related to mamba-org/setup-micromamba#225, but the resolution there is not satisfying.
Also see:
- the list of these job histories
- a single instance of a job log that shows the last output before the timeout