Skip to content

UNSTABLE pull / linux-jammy-py3-clang12-executorch / test (executorch) #144480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
huydhn opened this issue Jan 9, 2025 · 13 comments
Closed

UNSTABLE pull / linux-jammy-py3-clang12-executorch / test (executorch) #144480

huydhn opened this issue Jan 9, 2025 · 13 comments
Assignees
Labels
module: ci Related to continuous integration triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module unstable

Comments

@huydhn
Copy link
Contributor

huydhn commented Jan 9, 2025

The test starts failing flakily possibly after #143787 lands and needs to be updated

cc @seemethere @malfet @pytorch/pytorch-dev-infra @mergennachin

@pytorch-bot pytorch-bot bot added module: ci Related to continuous integration unstable labels Jan 9, 2025
Copy link

pytorch-bot bot commented Jan 9, 2025

Hello there! From the UNSTABLE prefix in this issue title, it looks like you are attempting to unstable a job in PyTorch CI. The information I have parsed is below:
  • Job name: pull / linux-jammy-py3-clang12-executorch / test (executorch)
  • Credential: huydhn

Within ~15 minutes, pull / linux-jammy-py3-clang12-executorch / test (executorch) and all of its dependants will be unstable in PyTorch CI. Please verify that the job name looks correct. With great power comes great responsibility.

@huydhn
Copy link
Contributor Author

huydhn commented Jan 9, 2025

#144466 (comment)

@malfet malfet added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 9, 2025
@malfet
Copy link
Contributor

malfet commented Jan 14, 2025

If job is unstable for a while, shouldn't it be disable to save the resources? (As users are not getting any healthy signal from it anyway)

@huydhn
Copy link
Contributor Author

huydhn commented Jan 14, 2025

Let me try to pump the pinned commit and escalate to ET team on this if it doesn’t fix the issue. IIRC, the failure also happened on their CI, but it looks fixed there now

@huydhn
Copy link
Contributor Author

huydhn commented Jan 14, 2025

It looks fixed now after #140769 lands. I'm going to close this soon

@huydhn
Copy link
Contributor Author

huydhn commented Jan 15, 2025

A different issue with torchtune pin showing up. It's fixed by pytorch/executorch#7670, so I'm trying to bring this commit to PyTorch.

@huydhn huydhn reopened this Jan 15, 2025
@huydhn
Copy link
Contributor Author

huydhn commented Jan 17, 2025

Fixed by #144813

@huydhn huydhn closed this as completed Jan 17, 2025
@malfet malfet reopened this Mar 17, 2025
@malfet
Copy link
Contributor

malfet commented Mar 17, 2025

pytorchmergebot pushed a commit that referenced this issue Mar 19, 2025
Latest commit in https://hud.pytorch.org/hud/pytorch/executorch/viable%2Fstrict/1?per_page=50

Follow-up to #144480 (comment)

Also, need to incorporate change from pytorch/executorch#8817

Test Plan:

Monitor  linux-jammy-py3-clang12-executorch test
Pull Request resolved: #149539
Approved by: https://github.com/larryliu0820
@mergennachin
Copy link
Contributor

Fixed by #149539

mergennachin added a commit that referenced this issue Mar 20, 2025
Latest commit in https://hud.pytorch.org/hud/pytorch/executorch/viable%2Fstrict/1?per_page=50

Follow-up to #144480 (comment)

Also, need to incorporate change from pytorch/executorch#8817

Test Plan:

Monitor  linux-jammy-py3-clang12-executorch test
Pull Request resolved: #149539
Approved by: https://github.com/larryliu0820

(cherry picked from commit bc86b6c)
svekars pushed a commit that referenced this issue Mar 21, 2025
Latest commit in https://hud.pytorch.org/hud/pytorch/executorch/viable%2Fstrict/1?per_page=50

Follow-up to #144480 (comment)

Also, need to incorporate change from pytorch/executorch#8817

Test Plan:

Monitor  linux-jammy-py3-clang12-executorch test
Pull Request resolved: #149539
Approved by: https://github.com/larryliu0820
@malfet malfet reopened this Mar 24, 2025
ZainRizvi pushed a commit that referenced this issue Mar 25, 2025
Update ExecuTorch pin update (#149539)

Latest commit in https://hud.pytorch.org/hud/pytorch/executorch/viable%2Fstrict/1?per_page=50

Follow-up to #144480 (comment)

Also, need to incorporate change from pytorch/executorch#8817

Test Plan:

Monitor  linux-jammy-py3-clang12-executorch test
Pull Request resolved: #149539
Approved by: https://github.com/larryliu0820

(cherry picked from commit bc86b6c)
@mergennachin
Copy link
Contributor

mergennachin commented Apr 1, 2025

Yeah, still flaky but not as bad as before... And the root cause is different now. Before tests were flaky due to xnnpack issue (which got fixed with the latest pin update with #150308) but looks like now the flakiness due to test_llama3_2_text_decoder_aoti and test_tiled_token_positional_embedding_et tests

Based on the error message, they're running out of memory(?)

In ExecuTorch repo, we pretty much run the same unit test suite on linux.2xlarge runner which is not running out of memory, but on PyTorch/PyTorch, looks like we're running on ephemeral.linux.2xlarge

Are these differences between these two runners?

@ZainRizvi
Copy link
Contributor

There should be no difference that affects the job. The ephemeral variance gives you stronger guarantees around having your job running in a fresh environment (it swaps out the disk after each job is run). Other than that you have the exact same hardware as the non-ephemeral instance

amathewc pushed a commit to amathewc/pytorch that referenced this issue Apr 17, 2025
Latest commit in https://hud.pytorch.org/hud/pytorch/executorch/viable%2Fstrict/1?per_page=50

Follow-up to pytorch#144480 (comment)

Also, need to incorporate change from pytorch/executorch#8817

Test Plan:

Monitor  linux-jammy-py3-clang12-executorch test
Pull Request resolved: pytorch#149539
Approved by: https://github.com/larryliu0820
@huydhn
Copy link
Contributor Author

huydhn commented Apr 29, 2025

This is not unstable anymore

@huydhn huydhn closed this as completed Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: ci Related to continuous integration triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module unstable
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants