Skip to content

Conversation

@ZainRizvi
Copy link
Contributor

@ZainRizvi ZainRizvi commented Jun 25, 2025

We originally tried to simplify the number of parameters users had to pass in to linux_job_v2 by using the existence of the docker build script as a heuristic.

That worked for a while, but that heuristic is now starting to break. If a repo has an unrelated docker build script located in .ci/docker/build.sh, we assume it wants to pull from our custom ECR path instead of docker hub.

Our ideal end state would now look like:

  1. Have an explicit parameter that needs to be set if a repo wants to use our ECR registries instead of assuming the desire based on the existence of a specific file. We default to docker hub
  2. Migration to this end state would include fix all existing domain repos that have that docker build file defined to pass in that parameter.

That migration requires many repos to be edited with a fix however, so for now we're having the following non-breaking change:

  1. Have an explicit parameter on linux_job_v2 that needs to be set if a repo wants to use our ECR registries. Set it to TRUE by default, with the calculate-docker-image logic being "If param is set to True AND special docker file exists" then we use the ECR registry. If either is false, use docker hub

  2. No migrations needed in the short term. We can later migrate domain repos to explicitly set this setting and then change the default to FALSE, but that's P2

Testing: Ran this job against it, and it only failed due to an error in the script inputted to the workflow, which is ignorable

@vercel
Copy link

vercel bot commented Jun 25, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Updated (UTC)
torchci ⬜️ Ignored (Inspect) Visit Preview Jun 25, 2025 7:01pm

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 25, 2025
# gracefully return the docker image name as it is. Pulling docker image in Linux
# job could then download the pre-built image as usual
if [[ ! -d "${DOCKER_BUILD_DIR}" ]] || [[ ! -f "${DOCKER_BUILD_DIR}/${DOCKER_BUILD_SCRIPT}" ]]; then
if [[ -d "${DOCKER_BUILD_DIR}" ]] && [[ -f "${DOCKER_BUILD_DIR}/${DOCKER_BUILD_SCRIPT}" ]] && [[ "${USE_CUSTOM_DOCKER_REGISTRY}" == "true" ]]; then
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: Inverted the if statement for better readability

@ZainRizvi ZainRizvi requested review from a team and seemethere June 25, 2025 18:03
echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}"
echo "There is no Docker build script in ${REPO_NAME} repo, skipping..."
echo "There is no Docker build script in ${REPO_NAME} repo, or not using custom registry, skipping..."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line was tripping me up. I tried to make it more clear and concise.

Suggested change
echo "There is no Docker build script in ${REPO_NAME} repo, or not using custom registry, skipping..."
echo "No Docker build script in ${REPO_NAME} repo or custom registry not used, skipping..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I changed the wording even more.

default: ""
type: string
use-custom-docker-registry:
description: "Use the custom ECR registry hosted on AWS. Only takes effect if there's a build.sh script in the docker-build-dir."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: "Use the custom ECR registry hosted on AWS. Only takes effect if there's a build.sh script in the docker-build-dir."
description: "Use the custom ECR registry. Applies only if build.sh exists in docker-build-dir."

- name: Calculate docker image
id: calculate-docker-image
uses: ./test-infra/.github/actions/calculate-docker-image
uses: pytorch/test-infra/.github/actions/calculate-docker-image@zainr/nova-gpu
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: revert this before merging

- name: Calculate docker image
id: calculate-docker-image
uses: ./test-infra/.github/actions/calculate-docker-image
uses: pytorch/test-infra/.github/actions/calculate-docker-image@zainr/nova-gpu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this mostly for testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, I left a "todo: revert this" comment on it just a couple mins before you posted this :)

Removed now

@ZainRizvi ZainRizvi marked this pull request as ready for review June 25, 2025 19:02
@ZainRizvi ZainRizvi merged commit 9f1e08d into main Jun 25, 2025
21 checks passed
@ZainRizvi ZainRizvi deleted the zainr/nova-gpu branch June 25, 2025 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants