Use unittest XML files to parse PyTorch test results #3633

Flamefire · 2025-02-21T12:00:12Z

(created using eb --new-pr)

Requires

Ignore other classes if software specific easyblock class was found easybuild-framework#4769
enhance apply_regex_substitutions to support use of multi-line patterns, requiring matching all patterns in each file, and use pre-compiled regular expressions easybuild-framework#4758
{tools}[GCCcore/10.3.0 - 14.2.0] unittest-xml-reporting v3.1.0, lxml v5.3.0, libxslt v1.1.42 easybuild-easyconfigs#22205
Python 3.6+ (So close to EB 5 release it didn't seem necessary to make it even harder by supporting Python 2/3.5). IIRC the file is only imported when installing PyTorch or a dependency so even for EB 4.x the restriction is only that installing PyTorch with Python < 3.6 doesn't work anymore. It allows using e.g. f-strings and some parts of the type-hint system don't work in 3.5 or before

Some explanations:

In a discussion in a PyTorch issue the only machine readable output are the test XML files which are only generated on (their) CI
The easyblock applies patches to allow enabling test reports by setting an EasyBuild specific variable.
- There is an option to pass to run_test.py that is supposed to enable that but it isn't passed to subprocess and hence not reliable
- Another bug results in not generating test reports outside CI even with the option passed
- As the patched file gets installed we shouldn't change its (default) behavior in case users use it, hence the env variable
The PyTorch test suite uses Python unittest, pytest and since 2.3 a custom logic to rerun failed tests. That generates XML result files in different formats and potentially with duplicates
- The successful reruns might be reported alongside their previous failures but in different files so "merging" is required to keep only the successful ones
- The implemented parser collects all results and attributes them to their "test suite" (usually the Python file executed which might include or run other files)
- Some tests are run multiple times in different configurations, i.e. the same test file is executed multiple times with an environment variable set to choose e.g. the distribution backend. Those need to be considered as separate tests
- Afterwards all results are combined/merged. Each test from the same test suite that is found multiple times is considered as successful if at least one of the duplicates was successful
I used Python type hints to make it a bit easier to follow
In many places assumptions are verified by raising a descriptive error. This should allow to detect changes in PyTorch that affect the logic
The "old" (current) parsing of the stdout is still used
- The new logic is only enabled when the PyTorch easyconfig has the required xmlrunner Python package directly or transitively. We have unittest-xml-reporting ECs for that
- For PyTorch < 2.3 (Since 2.3 that parser isn't really useful anymore) the found results are compared and differences shown in the logfile. They should match of course, but in the end the result from the XML files is used
- The final output of PyTorchs run_tests.py contains a list of failed test suites. We match against that as before to detect when we missed something.
- That also detects test suites that failed to start due to e.g. syntax errors introduced by our patches. In that case no XML file is generated and we'd miss it but we should handle all those cases by fixing the issue or skipping the test
I considered verifying the found suites against the list of suites to run as printed by run_test.py but some of the test files are missing the code required to start the test and hence show up in that list but produce no output at all
The easyblock file can be run directly and accepts:
- An easybuild log file: Parses the stdout of run_test as found in the log to test the old parser. This exists already
- A directory: Run the new (XML) parser on a test-results folder containing the XML reports

I prepared PRs for new and old PyTorch ECs to include the dependency required for the XML reporting. Those can be used to test this PR:

All look fine. A couple show a warning (as designed):

WARNING Found differences when parsing stdout and XML files:
Different number of tests in XML files: 211197 != 211209

I traced that down to tests being rerun in 2 dimensions: With different backends (e.g NCCL and Gloo) which I accounted for by taking the dist-nccl/dist-gloo prefix into account to count the tests as different. For the 2nd dimension (Init method: file, env or none, i.e. serial) there is no such "tag" and my code merges them which makes it count as a single test. I created a PR which is merged for the next PyTorch version. Adding the patch to existing ECs is IMO not worth it. This is an issue if the test fails with one backend but succeeds in another so it looks like a successful rerun and we miss a failure. Didn't observe that though and I'd assume it fails in all backends or none.

FTR in 2.1.2 I collected:

Running distributed tests for the test backend with env init_method
  Skipped
Running distributed tests for the test backend with file init_method
  Skipped
Running distributed tests for the nccl backend with env init_method
  6 tests:
    test/distributed/algorithms/quantization/test_quantization.py::DistQuantizationTests::test_all_gather_bfp16
    test/distributed/algorithms/quantization/test_quantization.py::DistQuantizationTests::test_all_gather_fp16
    test/distributed/algorithms/quantization/test_quantization.py::DistQuantizationTests::test_all_to_all_bfp16
    test/distributed/algorithms/quantization/test_quantization.py::DistQuantizationTests::test_all_to_all_fp16
    test/distributed/algorithms/quantization/test_quantization.py::DistQuantizationTests::test_all_to_all_single_bfp16
    test/distributed/algorithms/quantization/test_quantization.py::DistQuantizationTests::test_all_to_all_single_fp16
Running distributed tests for the nccl backend with file init_method
   6 test, same as above
Running distributed tests for the gloo backend with env init_method
    same
Running distributed tests for the gloo backend with file init_method
   same

So for 2 backends 2 duplicate init methods get merged which is 2*6=12 "missed" tests. Those appear as "=== 1 passed in 4.45s ====" in the log, so the stdout-parsing logic collects them. --> All fine

For one such test the XML files look like this which shows that there is no way to reasonable deduplicate them:

$ xmllint --format test-reports/dist-gloo/distributed.algorithms.quantization.test_quantization/distributed.algorithms.quantization.test_quantization-c089f7b2dd0b9af4.xml
<?xml version="1.0"?>
<testsuites>
  <testsuite name="pytest" errors="0" failures="0" skipped="0" tests="1" time="4.467" timestamp="2025-03-06T11:59:56.386921" hostname="i8021">
    <testcase classname="DistQuantizationTests" name="test_all_gather_bfp16" time="4.412" file="distributed/algorithms/quantization/test_quantization.py"/>
  </testsuite>
</testsuites>
$ xmllint --format test-reports/dist-gloo/distributed.algorithms.quantization.test_quantization/distributed.algorithms.quantization.test_quantization-ef746dd9698fd528.xml
<?xml version="1.0"?>
<testsuites>
  <testsuite name="pytest" errors="0" failures="0" skipped="0" tests="1" time="4.649" timestamp="2025-03-06T12:00:52.179397" hostname="i8021">
    <testcase classname="DistQuantizationTests" name="test_all_gather_bfp16" time="4.612" file="distributed/algorithms/quantization/test_quantization.py"/>
  </testsuite>
</testsuites>

They could be added to the "rerun" statistic though.

akesandgren · 2025-03-05T10:02:14Z

@Flamefire Do you expect to have more to add to this one?

Flamefire · 2025-03-05T10:10:27Z

Likely not. I'm still checking a couple GPU builds but the CPU builds have been verified so I don't expect anything new

akesandgren · 2025-03-05T10:12:25Z

Ok, I'll review and merge later today then.

akesandgren · 2025-03-05T11:26:11Z

easybuild/easyblocks/p/pytorch.py


+        self.has_xml_test_reports = False
+        if pytorch_version >= '1.10.0':
+            out, ec = run_cmd(self.python_cmd + " -c 'import xmlrunner'", log_ok=False)


Hmm, why not just check that unittest-xml-reporting is a builddep?

Because I wanted to anticipate that we might have it in Python or Python-bundle-PyPI

Ok, make sense, but that's likely only ever going to be a builddep and thus doesn't belong in either python or python-bundle-* in my opinion.

Anyway if needed it can be changed later :-)

I can imagine others need that too, even users and we provide e.g. pytest too. and the one additional call doesn't hurt in the testing of PyTorch ;-)

akesandgren · 2025-03-05T11:29:04Z

easybuild/easyblocks/p/pytorch.py

+        if new_suites:
+            diffs.append(f'Found {len(new_suites)} new suites in XML files: {", ".join(sorted(new_suites))}')
+        if missing_suites:
+            diffs.append(f'Did not found {len(missing_suites)} suites in XML files: ' +


Do you mean "Did not find" or "Did find"

Yes, just a typo. "not find". Fixed

akesandgren · 2025-03-05T11:31:19Z

easybuild/easyblocks/p/pytorch.py

+        if new_tests:
+            diffs.append(f'Found {len(new_tests)} new tests with errors in XML files: {", ".join(sorted(new_tests))}')
+        if missing_tests:
+            diffs.append(f'Did not found {len(missing_tests)} tests with errors in XML files: ' +


Same here, "Did not find" or something

akesandgren · 2025-03-05T11:31:34Z

easybuild/easyblocks/p/pytorch.py

+        if new_tests:
+            diffs.append(f'Found {len(new_tests)} new failed tests in XML files: {", ".join(sorted(new_tests))}')
+        if missing_tests:
+            diffs.append(f'Did not found {len(missing_tests)} failed tests in XML files: ' +


akesandgren

LGTM

akesandgren · 2025-03-10T12:28:02Z

Going in, thanks @Flamefire!

Flamefire added 2 commits February 21, 2025 13:00

Use unittest XML files to parse PyTorch test results

e8b4c18

Use a variable to avoid enabling test reports after installation

975898e

Make parsing more reliable and resistent

1bd2c92

This was referenced Feb 26, 2025

{ai}[foss/2023b] PyTorch v2.2.1 easybuilders/easybuild-easyconfigs#22361

Open

Add unittest-xml-reporting build dependency to PyTorch 2.1.2 (foss 2023a/b) easybuilders/easybuild-easyconfigs#22359

Open

Flamefire added 3 commits February 26, 2025 14:24

Catch more failed suites

00c292c

Reuse pattern for failed suite name to always remove shard info

097c721

Deduplicate failed test names

83d2e3c

Flamefire added 3 commits February 28, 2025 10:16

Small improvement to failure information

94c3195

Remove extra char in difference output

62be66f

Add missing space

6f85df0

Flamefire force-pushed the 20250221130009_new_pr_pytorch branch from 437cb39 to 6f85df0 Compare March 4, 2025 08:25

akesandgren mentioned this pull request Mar 5, 2025

{ai}[foss/2023b] PyTorch v2.3.0 easybuilders/easybuild-easyconfigs#20489

Merged

1 task

akesandgren reviewed Mar 5, 2025

View reviewed changes

Fix messages and make executable

0d9a00d

akesandgren approved these changes Mar 10, 2025

View reviewed changes

akesandgren added the bug fix label Mar 10, 2025

akesandgren added the update label Mar 10, 2025

akesandgren added this to the release after 4.9.4 milestone Mar 10, 2025

akesandgren merged commit 0d1bf10 into easybuilders:develop Mar 10, 2025
41 checks passed

Flamefire deleted the 20250221130009_new_pr_pytorch branch March 10, 2025 15:40

boegel modified the milestones: release after 4.9.4, 5.0.0 Mar 18, 2025

Use unittest XML files to parse PyTorch test results #3633

Use unittest XML files to parse PyTorch test results #3633

Uh oh!

Conversation

Flamefire commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akesandgren commented Mar 5, 2025

Uh oh!

Flamefire commented Mar 5, 2025

Uh oh!

akesandgren commented Mar 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akesandgren left a comment

Choose a reason for hiding this comment

Uh oh!

akesandgren commented Mar 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Flamefire commented Feb 21, 2025 •

edited

Loading