-
Notifications
You must be signed in to change notification settings - Fork 24.1k
UNSTABLE pull / linux-jammy-py3-clang12-executorch / test (executorch) #144480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello there! From the UNSTABLE prefix in this issue title, it looks like you are attempting to unstable a job in PyTorch CI. The information I have parsed is below:
Within ~15 minutes, |
If job is unstable for a while, shouldn't it be disable to save the resources? (As users are not getting any healthy signal from it anyway) |
Let me try to pump the pinned commit and escalate to ET team on this if it doesn’t fix the issue. IIRC, the failure also happened on their CI, but it looks fixed there now |
It looks fixed now after #140769 lands. I'm going to close this soon |
A different issue with torchtune pin showing up. It's fixed by pytorch/executorch#7670, so I'm trying to bring this commit to PyTorch. |
Fixed by #144813 |
It's unstable again, due to missing dependency, see https://hud.pytorch.org/hud/pytorch/pytorch/1157367c786c2bdbd25fbe73ddc35d265f924bb0/1?per_page=50&name_filter=executorch&mergeLF=true |
Latest commit in https://hud.pytorch.org/hud/pytorch/executorch/viable%2Fstrict/1?per_page=50 Follow-up to #144480 (comment) Also, need to incorporate change from pytorch/executorch#8817 Test Plan: Monitor linux-jammy-py3-clang12-executorch test Pull Request resolved: #149539 Approved by: https://github.com/larryliu0820
Fixed by #149539 |
Latest commit in https://hud.pytorch.org/hud/pytorch/executorch/viable%2Fstrict/1?per_page=50 Follow-up to #144480 (comment) Also, need to incorporate change from pytorch/executorch#8817 Test Plan: Monitor linux-jammy-py3-clang12-executorch test Pull Request resolved: #149539 Approved by: https://github.com/larryliu0820 (cherry picked from commit bc86b6c)
Latest commit in https://hud.pytorch.org/hud/pytorch/executorch/viable%2Fstrict/1?per_page=50 Follow-up to #144480 (comment) Also, need to incorporate change from pytorch/executorch#8817 Test Plan: Monitor linux-jammy-py3-clang12-executorch test Pull Request resolved: #149539 Approved by: https://github.com/larryliu0820
Update ExecuTorch pin update (#149539) Latest commit in https://hud.pytorch.org/hud/pytorch/executorch/viable%2Fstrict/1?per_page=50 Follow-up to #144480 (comment) Also, need to incorporate change from pytorch/executorch#8817 Test Plan: Monitor linux-jammy-py3-clang12-executorch test Pull Request resolved: #149539 Approved by: https://github.com/larryliu0820 (cherry picked from commit bc86b6c)
Yeah, still flaky but not as bad as before... And the root cause is different now. Before tests were flaky due to xnnpack issue (which got fixed with the latest pin update with #150308) but looks like now the flakiness due to test_llama3_2_text_decoder_aoti and test_tiled_token_positional_embedding_et tests Based on the error message, they're running out of memory(?) In ExecuTorch repo, we pretty much run the same unit test suite on linux.2xlarge runner which is not running out of memory, but on PyTorch/PyTorch, looks like we're running on ephemeral.linux.2xlarge Are these differences between these two runners? |
There should be no difference that affects the job. The ephemeral variance gives you stronger guarantees around having your job running in a fresh environment (it swaps out the disk after each job is run). Other than that you have the exact same hardware as the non-ephemeral instance |
Latest commit in https://hud.pytorch.org/hud/pytorch/executorch/viable%2Fstrict/1?per_page=50 Follow-up to pytorch#144480 (comment) Also, need to incorporate change from pytorch/executorch#8817 Test Plan: Monitor linux-jammy-py3-clang12-executorch test Pull Request resolved: pytorch#149539 Approved by: https://github.com/larryliu0820
From latest viable/strict: https://hud.pytorch.org/hud/pytorch/executorch/viable%2Fstrict/1?per_page=50 Fixes pytorch#144480 This commit has important CI stability fixes, such as pytorch/executorch#9561 and pytorch/executorch#9634 Pull Request resolved: pytorch#150308 Approved by: https://github.com/jathu, https://github.com/malfet
This is not unstable anymore |
The test starts failing flakily possibly after #143787 lands and needs to be updated
cc @seemethere @malfet @pytorch/pytorch-dev-infra @mergennachin
The text was updated successfully, but these errors were encountered: