Skip to content

Conversation

@huydhn
Copy link
Contributor

@huydhn huydhn commented Apr 24, 2024

Fixes pytorch/executorch#1759

Before this, a job can only be seen as unstable when it has the unstable keyword in its name. The trick of renaming the job can be done with PyTorch test jobs by manipulating the test matrix. However, that doesn't work for PyTorch build job or Nova workflow without moving the filter script to a separate job to run before them. That approach doesn't scale.

So, this PR implements a different approach to query the list of open unstable issues from Rockset from HUD, then group all jobs with an open unstable issues to the unstable group.

The PR is big because the following top level pages need to query the list and pass it around to different React components. The main update is in the function isUnstableJob which calls hasOpenUnstableIssue. The important call paths are as follows:

  • HUD main page → getGroupingDataclassifyGrouphasOpenUnstableIssue
  • HUD commit/PR pages → CommitStatusFilteredJobListFailedJobInfoJobSummaryisUnstableJob
  • HUD commit/PR pages → CommitStatusWorkflowsContainer -> WorkflowBox -> WorkflowJobSummary -> JobSummaryisUnstableJob

There are some misc pages like HUD flaky test pages that also needs to be updated, but they don't need to have the list of unstable issues, so I will just pass an empty array there.

The second part of this change is on Dr.CI.

Testing

trunk / test-coreml-delegate / macos-job from ET is marked as unstable by pytorch/executorch#3264, and it seems to show up correctly https://torchci-git-support-unstable-job-other-repos-fbopensource.vercel.app/hud/pytorch/executorch/main.

An UX caveat is that all the invocations of the job will appear in the unstable group because there is no unstable keyword to distinguish them.

@vercel
Copy link

vercel bot commented Apr 24, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
torchci ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 25, 2024 1:51am

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 24, 2024
@huydhn huydhn requested a review from clee2000 April 24, 2024 18:25
@huydhn huydhn marked this pull request as ready for review April 24, 2024 18:25
@huydhn huydhn requested a review from ZainRizvi April 24, 2024 19:01
Copy link
Contributor

@clee2000 clee2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preview for pytorch/pytorch hud page has a bunch of builds in the unstable category but I can't find unstable issues for them

@huydhn
Copy link
Contributor Author

huydhn commented Apr 24, 2024

The preview for pytorch/pytorch hud page has a bunch of builds in the unstable category but I can't find unstable issues for them

Oh, @ZainRizvi explicitly added them into https://github.com/pytorch/pytorch/blob/main/.github/workflows/unstable.yml as part of ARC rollout in pytorch/pytorch#124753. So, they should be there. That's the old way to mark a job as unstable, it's working in the context of ARC rollout.

@clee2000
Copy link
Contributor

clee2000 commented Apr 24, 2024

I can't remember if this is a thing, but do unstable jobs get grouped together on PR/commit pages?

https://torchci-git-support-unstable-job-other-repos-fbopensource.vercel.app/pytorch/executorch/commit/66a350b1bbf7a199b33778be17fef6847ffc37c0

@huydhn
Copy link
Contributor Author

huydhn commented Apr 24, 2024

I can't remember if this is a thing, but do unstable jobs get grouped together on PR/commit pages?

Good point! They should be listed under their own section, i.e. https://torchci-git-support-unstable-job-other-repos-fbopensource.vercel.app/pytorch/pytorch/commit/977105466fac8caec5f3b8ae9eaf3b977f59fbfb. It looks like I miss a thing or two here. At least, there is no regression of existing functionality from what I see

@huydhn
Copy link
Contributor Author

huydhn commented Apr 25, 2024

https://torchci-git-support-unstable-job-other-repos-fbopensource.vercel.app/pytorch/executorch/commit/66a350b1bbf7a199b33778be17fef6847ffc37c0 is fixed now and correctly shows the failed unstable jobs section. The miss is that I forgot to pass the list of unstable issues to isUnstableJob in several places. To ensure backward compatibility and no funny issue with HUD, I make this param optional and default to an empty list.

@huydhn huydhn merged commit 2887fdb into main Apr 25, 2024
huydhn added a commit that referenced this pull request Apr 26, 2024
This is the next part of #5122
where Dr.CI also uses the list of unstable issues to check if a job is
an unstable job. The same function `isUnstableJob` that powers HUD from
#5122 is used.

### Testing

Mark a job on ET as unstable
pytorch/executorch#3344 as shown on HUD
https://hud.pytorch.org/hud/pytorch/executorch/main and Dr.CI applies
the same logic on a PR pytorch/executorch#3318
to show the job as unstable.

```
curl --request POST \
--url "http://localhost:3000/api/drci/drci?prNumber=3318" \
--header "Authorization: TOKEN" \
--data 'repo=executorch'
```

<!-- drci-comment-start -->

## 🔗 Helpful Links
### 🧪 See artifacts and rendered test results at
[hud.pytorch.org/pr/pytorch/executorch/3318](https://hud.pytorch.org/pr/pytorch/executorch/3318)
* 📄 Preview [Python docs built from this
PR](https://docs-preview.pytorch.org/pytorch/executorch/3318/index.html)

Note: Links to docs will display an error until the docs builds have
been completed.


## ✅ You can merge normally! (1 Unrelated Failure)
As of commit f712e381c161901b733baa6b1fe7d85dc25404d3 with merge base
b669056c1cff5f7fe3786df9e68a14447cd5410b (<sub><sub><img alt="image"
width=70
src="https://img.shields.io/date/1713981882?label=&color=FFFFFF&style=flat-square"></sub></sub>):
<details ><summary><b>UNSTABLE</b> - The following job failed but was
likely due to flakiness present on trunk and has been marked as
unstable:</summary><p>

* [Android / test-llama-app / mobile-job
(android)](https://hud.pytorch.org/pr/pytorch/executorch/3318#24225451316)
([gh](https://github.com/pytorch/executorch/actions/runs/8821313480/job/24225451316))
`Credentials could not be loaded, please check your action inputs: Could
not load credentials from any providers`
</p></details>


This comment was automatically generated by Dr. CI and updates every 15
minutes.
<!-- drci-comment-end -->
huydhn added a commit that referenced this pull request Jun 15, 2024
…R pages (#5343)

This is a miss from #5122
where I failed to check that the repo field in JobData is optional and
wasn't set in HUD commit and PR pages. So, marking an unstable job link
works correctly in main HUD page like
https://hud.pytorch.org/hud/pytorch/executorch/main where the data comes
from `fetchHud`, but it uses the default link pointing to
`pytorch/pytorch` in:

* Commit page
https://hud.pytorch.org/pytorch/executorch/commit/7c0f4c2b6f87369ee20544aada33dc55078c6bc6
* and PR page https://hud.pytorch.org/pr/pytorch/executorch/3972

The fix here is to populate the repo field if it's not there

### Testing

Verify this locally (need to login to HUD)

*
http://localhost:3000/pytorch/executorch/commit/7c0f4c2b6f87369ee20544aada33dc55078c6bc6
* http://localhost:3000/pr/pytorch/executorch/3972
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support to ExecuTorch CI to mark a job as unstable and ignore it from CI

4 participants