-
Notifications
You must be signed in to change notification settings - Fork 18k
x/build/windows-arm64: recover from unresponsive VM #47018
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Change https://golang.org/cl/332492 mentions this issue: |
Add barebones instructions for creating a macmini instance that runs a Windows ARM64 buildlet in a loop. The instruction templates are from our other macstadium builders. See golang/go#47018 for improvements. Updates golang/go#47018 Fixes golang/go#42604 Change-Id: I0bb092aaf99afb12a0e563a69bcb711333dda743 Reviewed-on: https://go-review.googlesource.com/c/build/+/332492 Trust: Alexander Rakoczy <[email protected]> Run-TryBot: Alexander Rakoczy <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Carlos Amedee <[email protected]>
Change https://golang.org/cl/334372 mentions this issue: |
Change https://golang.org/cl/334373 mentions this issue: |
runqemubuildlet runs a qemu-based buildlet in a loop. This will allow us to add better monitoring to the command than with the current bash script. WaitOrStop was originally implemented for x/playground in golang.org/cl/228438. It provides a safe way to terminate programs after a timeout, or to forcibly terminate them after a grace period. For golang/go#47018 Change-Id: I205c53554bdf287997d567d530581a93febea648 Reviewed-on: https://go-review.googlesource.com/c/build/+/334372 Trust: Alexander Rakoczy <[email protected]> Run-TryBot: Alexander Rakoczy <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]>
This adds a healthz endpoint to buildlets. For reverse buildlets, it also listens for healthz requests on a private port for a monitoring process. For golang/go#47018 Change-Id: I100a8939c5752664afb80472e567ab05a80649d7 Reviewed-on: https://go-review.googlesource.com/c/build/+/334373 Trust: Alexander Rakoczy <[email protected]> Run-TryBot: Alexander Rakoczy <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Heschi Kreinick <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]>
Change https://golang.org/cl/334953 mentions this issue: |
The spaces are not necessary, as each argument is passed correctly to the command. Add Stdout/Stderr output from qemu. For golang/go#47018 Change-Id: Ia908bf2cc639cc7d2a60bff137bc2e714a3ec6ef Reviewed-on: https://go-review.googlesource.com/c/build/+/334953 Trust: Alexander Rakoczy <[email protected]> Run-TryBot: Alexander Rakoczy <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]>
Change https://golang.org/cl/336109 mentions this issue: |
Expose the healthz port from the buildlet running under QEMU, and periodically check it for a successful response. If it has been failing for longer than ten minutes, try to restart the VM. This should successfully restart VMs that failed to boot, failed to shut down, or are otherwise unresponsive. For golang/go#47018 Change-Id: I9218f94ee24de6e0a56ad60a18e075ce48893938 Reviewed-on: https://go-review.googlesource.com/c/build/+/336109 Trust: Alexander Rakoczy <[email protected]> Run-TryBot: Alexander Rakoczy <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> Reviewed-by: Carlos Amedee <[email protected]>
Tested and verified. |
Change https://golang.org/cl/336590 mentions this issue: |
Reverse buildlets now listen publicly, which allows the QEMU host forwarding to route to the buildlet. Also, print a newline at the end of the healthz response for legibility. For golang/go#47018 Change-Id: I71ae1bf4d7cbee4867c42e863cb9f8c2569e1b69 Reviewed-on: https://go-review.googlesource.com/c/build/+/336590 Trust: Alexander Rakoczy <[email protected]> Run-TryBot: Alexander Rakoczy <[email protected]> Reviewed-by: Heschi Kreinick <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> TryBot-Result: Go Bot <[email protected]>
What version of Go are you using (
go version
)?Go tip: 4711bf3
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?windows/arm64
What did you do?
Caused a fatal OS error in the Windows ARM64 buildlet, which failed to reboot. (see #47017 for cause)
What did you expect to see?
The builder to exit successfully after a crash, and process a new build.
What did you see instead?
The Windows VM was stuck in the EFI booting stage, failing to boot windows after a fatal error.
The script that loops the VM is very naive, and will wait indefinitely for the VM to exit. We should kill the VM if it is unresponsive for some time, perhaps by exposing a
/healthz
endpoint on the Windows buildlet, and exposing it to the host.It probably makes sense to either extend buildlet to take on the responsibilities of something like
rundockerbuildlet
andmakemac
, or to add a newrunqemubuildlet
command.The text was updated successfully, but these errors were encountered: