Partition community build for faster CI #10030

griggt · 2020-10-17T16:02:10Z

Partition member selection is driven by the goal to minimize the running time of the longest running partition.

If this gets merged, the required status checks for pull requests will need updating, as the job named community_build will no longer exist.

Results of some test runs:

The projects in the community build can be partitioned nicely such that the testing of each partition has approximately the same running time (23-27 minutes). Moving some short running projects between the two partitions is mostly a noise-level change.

The real issue that is preventing big wins here is the wide variance in the running time of the GitHub actions/[email protected] steps (e.g. Cache Ivy, Cache SBT, etc.) that are embedded in each job. These steps can run for as little as a few seconds to 20 minutes or more, and their running time seems to be unpredictable. This issue is not particular to this PR and I have noticed it happening on other PRs and merges. Occasionally these long running cache restoration steps will display a warning/error message such as:

gzip: stdin: unexpected end of file
/usr/bin/tar: Unexpected EOF in archive
/usr/bin/tar: Unexpected EOF in archive
/usr/bin/tar: Error is not recoverable: exiting now
Warning: Tar failed with error: The process '/usr/bin/tar' failed with exit code 2

but do not actually set a failure status or abort the job.

Curiously, actions/[email protected] is not supported for scheduled events (says: Warning: Event Validation Error: The event type schedule is not supported. Only push, pull_request events are supported at this time.), and those steps do not perform any action during the nightly scheduled CI.

These cache management steps are causing a big performance hit (counterintuitively), and not being used at all in the nightly run (nor the new Windows CI jobs) indicates they may not be necessary. In my last few test runs I modified ci.yaml to remove them, and saw much better results: a full PR/merge CI run in under 30 minutes (see runs 5516, 5517, 5518).

The table below summarizes the running time of each test run. Columns Test A and Test B are the timings for only the Test step of the community_build_x job, and the corresponding Cache A and Cache B columns indicate the additional time spent for that job on Cache steps. The Workflow column is the running time for the entire workflow, and is only given for those runs where all jobs are enabled (for the the first seven rows of the table, the workflow only ran the new community build jobs, with varying choices for the partition members).

	CI #	Test A	Test B	Cache A	Cache B	Workflow	Notes
Only community_build_x jobs enabled:
	5507	26:29	21:33	5:09	20:11		Original partition selection
	5509	27:50	21:29	4:08	20:26		(as above)
	5510	23:16	25:02	2:33	4:59		Move algebra, fastparse, ScalaPB to partition B
	5511	Fail*	23:42	2:19	2:21		(as above)
	5512	23:20	27:33	29:19	1:33		(as above)
	5513	22:50	26:53	3:49	1:43		Move fastparse back to partition A
	5514	26:08	23:03	16:54	5:00		(as above)
All normal CI jobs enabled:
	5515	22:58	26:22	7:33	27:09	1:00:14
All normal CI jobs enabled; steps using actions/[email protected] removed:
	5516	23:15	29:04	-	-	29:42
	5517	27:36	23:18	-	-	28:20
	5518	26:05	26:29	-	-	27:08

* Spurious failure

I am cautiously reluctant to recommend removing all the actions/[email protected] steps without at least a better understanding of what's going on and some idea of what, if any benefits they bring. I also notice that the most current available version of the action is v2.1.2, which may fix some issues?

EDIT: cache issues have been fixed by #10197.

Fixes #9599

dottybot

Hello, and thank you for opening this PR! 🎉

All contributors have signed the CLA, thank you! ❤️

Have an awesome day! ☀️

smarter

EDIT: oops, wrong PR

wrong PR

.github/workflows/ci.yaml

griggt · 2020-11-06T00:34:06Z

Will put this on hold while the caching issues are investigated in #10197.

griggt · 2020-11-11T05:40:48Z

Now that #10197 has been merged, this has been rebased and review comments addressed.

dottybot reviewed Oct 17, 2020

View reviewed changes

griggt force-pushed the partition-community-build branch 6 times, most recently from ab3dcf1 to d0a0348 Compare October 18, 2020 01:29

griggt marked this pull request as ready for review October 18, 2020 04:27

nicolasstucki assigned anatoliykmetyuk Oct 27, 2020

griggt force-pushed the partition-community-build branch from d0a0348 to a93a65a Compare October 27, 2020 07:25

smarter requested a review from anatoliykmetyuk October 27, 2020 12:42

smarter previously requested changes Oct 27, 2020

View reviewed changes

This comment has been minimized.

Sign in to view

anatoliykmetyuk suggested changes Oct 29, 2020

View reviewed changes

.github/workflows/ci.yaml Outdated Show resolved Hide resolved

.github/workflows/ci.yaml Outdated Show resolved Hide resolved

griggt marked this pull request as draft November 6, 2020 00:35

Partition community build

5b29736

griggt force-pushed the partition-community-build branch from a93a65a to 5b29736 Compare November 11, 2020 04:24

griggt marked this pull request as ready for review November 11, 2020 05:41

griggt requested review from anatoliykmetyuk and smarter November 11, 2020 05:42

smarter approved these changes Nov 16, 2020

View reviewed changes

anatoliykmetyuk approved these changes Nov 19, 2020

View reviewed changes

anatoliykmetyuk merged commit f3bb4e8 into scala:master Nov 19, 2020

griggt deleted the partition-community-build branch November 21, 2020 19:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Partition community build for faster CI #10030

Partition community build for faster CI #10030

Uh oh!

griggt commented Oct 17, 2020 •

edited

Loading

Uh oh!

dottybot left a comment

Uh oh!

smarter left a comment •

edited

Loading

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Uh oh!

Uh oh!

griggt commented Nov 6, 2020

Uh oh!

griggt commented Nov 11, 2020

Uh oh!

Uh oh!

Partition community build for faster CI #10030

Partition community build for faster CI #10030

Uh oh!

Conversation

griggt commented Oct 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dottybot left a comment

Choose a reason for hiding this comment

Uh oh!

smarter left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Uh oh!

Uh oh!

griggt commented Nov 6, 2020

Uh oh!

griggt commented Nov 11, 2020

Uh oh!

Uh oh!

griggt commented Oct 17, 2020 •

edited

Loading

smarter left a comment •

edited

Loading