Skip to content

Conversation

@samypr100
Copy link
Collaborator

@samypr100 samypr100 commented May 11, 2024

Summary

Switch to using Virtual HD + ReFS to speed up windows jobs.

Closes #3508

@zanieb
Copy link
Member

zanieb commented May 11, 2024

So promising, thanks for exploring! You can push an empty commit to test the cached test time.

@samypr100 samypr100 force-pushed the windows-fs-speedup branch from 44a3c63 to 25be387 Compare May 11, 2024 13:46
@samypr100
Copy link
Collaborator Author

samypr100 commented May 11, 2024

Stats from first run...

Step Before After
Install Toolchain 1m ~ 11s~
Install nextest 15s ~ 8s~
Cargo Tests 5m 25s ~ (cached) 5m 44s ~ (uncached) -- TODO check cached speeds

Follow up updated run, will do a few more to try get the cached versions of some of these

Steps Before After
Clippy 30s~ (cached) 2m ~ (uncached)
Cache Restoration 1.5m ~ 12s ~
Installing Toolchain 1m ~ 11s~
Install nextest 15s ~ 8s~
Cargo Tests 5m 25s ~ (cached) 3m 59s ~ (cached)
Trampoline 24s ~ (cached) 25s ~ (uncached)
build binary 2m ~ (cached) 2m 30s ~ (uncached)

@samypr100
Copy link
Collaborator Author

More Step Totals

Steps Before After
Clippy step 30s~ (cached) 30s ~ (cached)
Cache Restoration 1.5m ~ 12s ~
Installing Toolchain 1m ~ 11s~
Install nextest 15s ~ 8s~
Cargo Tests 5m 25s ~ (cached) 3m 59s ~ (cached)
trampoline build step 24s ~ (cached) 24s ~ (cached)
binary build step 2m ~ (cached) 50s ~ (cached)

Job Totals (looking at a few runs)

Job Before After
Clippy 1m-1.5m ~ 1m-1.5m ~
Cargo Tests 8m 4m-5m ~
Trampoline build 4m ~ 2m ~
build binary 4m 1m-2m ~

Other notes might be worth adjusting the vhdx default size to speed up it's creation as it can take between 10s to 30s. I'll reduce to 5GB next.

@samypr100
Copy link
Collaborator Author

samypr100 commented May 11, 2024

Reduced to 5GB clippy & trampoline, left test & build binary 10GB since it ran out of space otherwise. Made no substantial difference, left at 10GB.

@samypr100 samypr100 force-pushed the windows-fs-speedup branch from cc5d807 to 28a1cde Compare May 11, 2024 15:07
@samypr100 samypr100 changed the title feat: speed up windows tests using ReFS feat: speed up windows jobs using ReFS May 11, 2024
@samypr100
Copy link
Collaborator Author

samypr100 commented May 11, 2024

In summary, migrated windows jobs got 1.5x-2.x ~ speed up except clippy which stayed at 1x. Clippy job can be a bit faster or slower depending on how fast the VHDX gets created. Maybe switching to cargo-xwin as described in #3507 along with VHDX from this PR can unlock faster speeds.

Note, I did not get data for uncached before. Only cached before and cached/uncached after. So it's possible when comparing clippy before and after uncached it might be faster.

@samypr100 samypr100 marked this pull request as ready for review May 11, 2024 15:22
@charliermarsh
Copy link
Member

I'll defer to @konstin and @zanieb on the review but based on the numbers, this seems really good! You're awesome!

@charliermarsh charliermarsh requested review from konstin and zanieb May 11, 2024 16:34
@samypr100 samypr100 force-pushed the windows-fs-speedup branch from 1142ae7 to 02894e3 Compare May 13, 2024 03:02
@samypr100 samypr100 force-pushed the windows-fs-speedup branch from 02894e3 to 6cbf357 Compare May 13, 2024 03:03
@zanieb
Copy link
Member

zanieb commented May 13, 2024

Should we skip it on Clippy for now then? I don't know if it merits the complexity there if it's both slower/faster.

@charliermarsh
Copy link
Member

I don't mind keeping it for consistency but defer to you.

@zanieb
Copy link
Member

zanieb commented May 13, 2024

I don't know, we need to split the job out of the matrix and maybe we're going to switch to xwin later? 🤷‍♀️ I don't have strong opinions though.

@charliermarsh
Copy link
Member

Sadly the decision falls upon you...

@samypr100
Copy link
Collaborator Author

samypr100 commented May 13, 2024

Note clippy might still be faster in purely uncached times, I haven't compared though. I'd say worth trying it out for a bit and we can change later easily.

Agreed, changing to xwin will require a split from the matrix regardless.

@zanieb zanieb added testing Internal testing of behavior internal A refactor or improvement that is not user-facing labels May 13, 2024
@zanieb zanieb merged commit 790206b into astral-sh:main May 13, 2024
@samypr100 samypr100 deleted the windows-fs-speedup branch May 14, 2024 02:11
@zooba
Copy link

zooba commented May 14, 2024

This is awesome to see!

You might need some error handling in case you land on an older system, but there's a -DevDrive option to Format-Volume that ought to improve things further. It may not be hugely significant since this is already a somewhat stripped back CI system rather than an OS install.

Putting the VHDX in $env:GITHUB_WORKSPACE might also help (I don't know exactly how the GHA setup looks, but it's based on Azure Pipelines which puts the OS on a slower disk than the workspace), and passing -Fixed to New-VHD may also help (but I'm less confident about that one).

@samypr100
Copy link
Collaborator Author

samypr100 commented May 15, 2024

Thanks! @zooba

there's a -DevDrive option to Format-Volume that ought to improve things further

Great point! I did try out -DevDrive, but got a Format-Volume: A parameter cannot be found that matches parameter name 'DevDrive'. since it seems it works on windows versions 10.0.22621 or greater and runners seem to be a bit below that ~10.0.20348.

I did add that check to the new action I created out of this PR that uses -DevDrive when it meets the minimum os version (e.g. self hosted runners) and also uses the same drive ${{ github.workspace }} uses. Feedback welcome of course :-)

mkeeter added a commit to mkeeter/fidget that referenced this pull request Jun 8, 2024
mkeeter added a commit to mkeeter/fidget that referenced this pull request Jun 8, 2024
mkeeter added a commit to mkeeter/fidget that referenced this pull request Jun 8, 2024
zanieb added a commit that referenced this pull request Jan 16, 2025
Previously, we couldn't use a DevDrive
(#3522 (comment))
because our Windows version was not sufficient.

Recently, I upgraded our larger runners to Windows 2025 preview
(#10298) which I presume has support
for this.

I removed ReFS in
953c353
which didn't seem to do anything to performance.

I also found some notes on "trusted" DevDrives and "disabling anti-virus
filtering" which I simply have to try.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal A refactor or improvement that is not user-facing testing Internal testing of behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Explore using a dev drive for windows CI

4 participants