-
Notifications
You must be signed in to change notification settings - Fork 308
Use unittest XML files to parse PyTorch test results #3633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use unittest XML files to parse PyTorch test results #3633
Conversation
437cb39 to
6f85df0
Compare
|
@Flamefire Do you expect to have more to add to this one? |
|
Likely not. I'm still checking a couple GPU builds but the CPU builds have been verified so I don't expect anything new |
|
Ok, I'll review and merge later today then. |
|
|
||
| self.has_xml_test_reports = False | ||
| if pytorch_version >= '1.10.0': | ||
| out, ec = run_cmd(self.python_cmd + " -c 'import xmlrunner'", log_ok=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, why not just check that unittest-xml-reporting is a builddep?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I wanted to anticipate that we might have it in Python or Python-bundle-PyPI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, make sense, but that's likely only ever going to be a builddep and thus doesn't belong in either python or python-bundle-* in my opinion.
Anyway if needed it can be changed later :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can imagine others need that too, even users and we provide e.g. pytest too. and the one additional call doesn't hurt in the testing of PyTorch ;-)
easybuild/easyblocks/p/pytorch.py
Outdated
| if new_suites: | ||
| diffs.append(f'Found {len(new_suites)} new suites in XML files: {", ".join(sorted(new_suites))}') | ||
| if missing_suites: | ||
| diffs.append(f'Did not found {len(missing_suites)} suites in XML files: ' + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean "Did not find" or "Did find"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, just a typo. "not find". Fixed
easybuild/easyblocks/p/pytorch.py
Outdated
| if new_tests: | ||
| diffs.append(f'Found {len(new_tests)} new tests with errors in XML files: {", ".join(sorted(new_tests))}') | ||
| if missing_tests: | ||
| diffs.append(f'Did not found {len(missing_tests)} tests with errors in XML files: ' + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, "Did not find" or something
easybuild/easyblocks/p/pytorch.py
Outdated
| if new_tests: | ||
| diffs.append(f'Found {len(new_tests)} new failed tests in XML files: {", ".join(sorted(new_tests))}') | ||
| if missing_tests: | ||
| diffs.append(f'Did not found {len(missing_tests)} failed tests in XML files: ' + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here
akesandgren
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Going in, thanks @Flamefire! |
(created using
eb --new-pr)Requires
apply_regex_substitutionsto support use of multi-line patterns, requiring matching all patterns in each file, and use pre-compiled regular expressions easybuild-framework#4758Some explanations:
run_test.pythat is supposed to enable that but it isn't passed to subprocess and hence not reliablexmlrunnerPython package directly or transitively. We haveunittest-xml-reportingECs for thatrun_tests.pycontains a list of failed test suites. We match against that as before to detect when we missed something.run_test.pybut some of the test files are missing the code required to start the test and hence show up in that list but produce no output at allrun_testas found in the log to test the old parser. This exists alreadytest-resultsfolder containing the XML reportsI prepared PRs for new and old PyTorch ECs to include the dependency required for the XML reporting. Those can be used to test this PR:
All look fine. A couple show a warning (as designed):
I traced that down to tests being rerun in 2 dimensions: With different backends (e.g NCCL and Gloo) which I accounted for by taking the
dist-nccl/dist-glooprefix into account to count the tests as different. For the 2nd dimension (Init method:file,envor none, i.e. serial) there is no such "tag" and my code merges them which makes it count as a single test. I created a PR which is merged for the next PyTorch version. Adding the patch to existing ECs is IMO not worth it. This is an issue if the test fails with one backend but succeeds in another so it looks like a successful rerun and we miss a failure. Didn't observe that though and I'd assume it fails in all backends or none.FTR in 2.1.2 I collected:
So for 2 backends 2 duplicate init methods get merged which is 2*6=12 "missed" tests. Those appear as
"=== 1 passed in 4.45s ===="in the log, so the stdout-parsing logic collects them. --> All fineFor one such test the XML files look like this which shows that there is no way to reasonable deduplicate them:
They could be added to the "rerun" statistic though.