Skip to content

Wait for docker build #6013

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Dec 6, 2024
Merged

Wait for docker build #6013

merged 10 commits into from
Dec 6, 2024

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Dec 5, 2024

This is a short-term mitigation for pytorch/pytorch#141885 in which any changes touching .ci/docker would cause all the builds to fail until docker build workflow finishes building the images.

At the moment, we don't have a good way to tell the build workflow to wait for the new docker image, so my fix here attempts to inject a delay when the action is called by _linux_build. It will wait up to 90 minutes for the Docker build to finish

Testing

pytorch/pytorch#142177

@huydhn huydhn requested review from malfet, atalman and a team December 5, 2024 03:23
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 5, 2024
Copy link

vercel bot commented Dec 5, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Updated (UTC)
torchci ⬜️ Ignored (Inspect) Visit Preview Dec 5, 2024 10:35pm

@huydhn huydhn marked this pull request as draft December 5, 2024 04:22
@huydhn
Copy link
Contributor Author

huydhn commented Dec 5, 2024

Close in favor of pytorch/pytorch#142109

@huydhn huydhn closed this Dec 5, 2024
@huydhn huydhn reopened this Dec 5, 2024
@huydhn
Copy link
Contributor Author

huydhn commented Dec 5, 2024

After chatting with @malfet, let try this one instead because pytorch/pytorch#142109 (review) adds few more minutes to the workflow TTS

@huydhn huydhn marked this pull request as ready for review December 5, 2024 22:32
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for the fix

@malfet malfet merged commit 81ebba3 into main Dec 6, 2024
4 checks passed
@malfet malfet deleted the wait-for-docker-build branch December 6, 2024 01:33
@chuanqi129
Copy link
Contributor

Hi @huydhn, I noticed that there are some failures in calculate-docker-image step in xpu ci test jobs, for example https://github.com/pytorch/pytorch/actions/runs/12198235184/job/34036392093?pr=140664#step:6:160. I suspect those failure related to this PR changes. Could you please help to double check it?
And another issue is that seems the build job spent more time than before, https://github.com/pytorch/pytorch/actions/runs/12198235184/job/34029552956?pr=140664#step:7:1. Is it expected?

@huydhn
Copy link
Contributor Author

huydhn commented Dec 7, 2024

@chuanqi129 Thank you for the fix in pytorch/pytorch#142298! It's the correct fix. The failure you see actually highlight a problem that was hidden before. Without adding the new Docker image into the docker build workflow, the image will be rebuilt in every build and tests jobs that depend on it, which is a huge waste of time.

Let me take an action item to write a linter check for this to make sure that adding a new Docker images requires a corresponding update to the docker build workflow.

pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Dec 8, 2024
Add missed new xpu docker image name to adapt the new mechanism introduced by pytorch/test-infra#6013
Works for #114850
Pull Request resolved: #142298
Approved by: https://github.com/huydhn
AmdSampsa pushed a commit to AmdSampsa/pytorch that referenced this pull request Dec 9, 2024
…ch#142298)

Add missed new xpu docker image name to adapt the new mechanism introduced by pytorch/test-infra#6013
Works for pytorch#114850
Pull Request resolved: pytorch#142298
Approved by: https://github.com/huydhn
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Dec 10, 2024
Some lint jobs are using the default 30 minutes timeout, but the jobs could wait up to 90 minutes now for the Docker image to become available after pytorch/test-infra#6013
Pull Request resolved: #142444
Approved by: https://github.com/wdvr
mori360 pushed a commit to mori360/pytorch that referenced this pull request Dec 11, 2024
Some lint jobs are using the default 30 minutes timeout, but the jobs could wait up to 90 minutes now for the Docker image to become available after pytorch/test-infra#6013
Pull Request resolved: pytorch#142444
Approved by: https://github.com/wdvr
bluenote10 pushed a commit to bluenote10/pytorch that referenced this pull request Dec 14, 2024
Some lint jobs are using the default 30 minutes timeout, but the jobs could wait up to 90 minutes now for the Docker image to become available after pytorch/test-infra#6013
Pull Request resolved: pytorch#142444
Approved by: https://github.com/wdvr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants