Skip to content

vz: lima managed vm hangs with high CPU usage intermittently. #1609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Tracked by #1610
vsiravar opened this issue Jun 5, 2023 · 21 comments
Open
Tracked by #1610

vz: lima managed vm hangs with high CPU usage intermittently. #1609

vsiravar opened this issue Jun 5, 2023 · 21 comments

Comments

@vsiravar
Copy link

vsiravar commented Jun 5, 2023

Problem

Virtualization Framework intermittently starts consuming 100%-220%(from Activity Monitor) CPU and is unresponsive. This leads to all limactl commands being unresponsive or failing.
This intermittently happens when the lima vm is started and left alone for a while.

Behaviour observed

  • limactl commands hang such as limactl shell <vm name>.
  • sometimes the command fails with RC 255

Once the vm gets to this state
All limactl commands fail.

Workaround

The way around it is to recreate vm.

Related issue

docker/for-mac#6655

Expected behaviour

That the vm should not hang when the computer wakes up from sleep.

Host info

macOS version: 13.4
cpu brand: Apple M1 Pro
lima version: 0.16.0
@balajiv113
Copy link
Member

@vsiravar
is this consistently reproducible ?

Before sleep was there any high intensive task running in vm ?

@vsiravar
Copy link
Author

vsiravar commented Jun 5, 2023

@vsiravar is this consistently reproducible ?

No, it's quite intermittent.

Before sleep was there any high intensive task running in vm ?

Not really, I just have a hello-world container running in the vm. I have not experienced this behaviour with qemu.

@balajiv113
Copy link
Member

@vsiravar
With current master we now have support for video display. If possible could you enable display and try to replicate the same ??

When it hangs you can check from ui and see if vm is accessible. This will give an idea if the issue is with network/with vm itself

@vsiravar
Copy link
Author

vsiravar commented Jun 6, 2023

With current master we now have support for video display. If possible could you enable display and try to replicate the same ??

Sure, will try this out. Thanks!

@ningziwen
Copy link
Contributor

ningziwen commented Jun 8, 2023

I think this doesn't only happen when computer wakes up from sleep...

I successfully initialized the VM and ran some commands normally. But after I reply several messages in Slack and come back (around 10 mins), it starts to hang and return FATA[0928] exit status 255.

VM Service has 300% + CPU usage.

Screenshot 2023-06-08 at 3 33 14 PM

@balajiv113
Copy link
Member

@ningziwen could you also try enabling display as mentioned above and see ??

Also do share you template which you used.

@ningziwen
Copy link
Contributor

@balajiv113 Sorry I didn't get what it means. Would you like to do screen recording and upload the video? Or using any GUI? Could you point me the instruction if it is GUI?

@balajiv113
Copy link
Member

@ningziwen
Steps to enable display

  • Build and Install lima from master
  • Edit your template with
   video:
     display: "vz"
  • Start your template
  • Do the normal work and try to reproduce the freeze.
  • When hangs check if the display is usable / does contain any logs

This will give us a idea if there are some issues with network/whole vm itself.

@balajiv113
Copy link
Member

I tired the above steps myself. Haven't got high cpu usage but the freeze happens.

On checking the GUI during the freeze even that was not responsive so i think the freeze happens on virtualization.framework level not on network.

I have also raised a support ticket with Apple with the same info.

Note: This happens to me on M1 only. My intel runs smooth for weeks with sleep and wake cases

@ningziwen
Copy link
Contributor

@balajiv113 Hey. Did you get any reply from Apple? Is the support ticket link sharable?

@vsiravar vsiravar changed the title vz: lima managed vm hangs with high CPU usage when computer wakes up from sleep. vz: lima managed vm hangs with high CPU usage intermittently. Jun 27, 2023
@vsiravar
Copy link
Author

Updated ticket description and title based on new behaviour observed.

@ryancurrah
Copy link
Contributor

ryancurrah commented Jul 10, 2023

Maybe once #1659 is resolved you can look at the serial.log to see if there is any related log messages.

@bsideup
Copy link

bsideup commented Sep 2, 2023

Confirming that this is still happening (HEAD as of today, M1)

@outcoldman
Copy link

I am also experiencing the same issue. Just started using limactl instead of other VM providers. First had to deal with the time shift, so I have added the following

timedatectl set-ntp no
apt update
apt install -y ntp

Now, every morning get to the high CPU usage, and cannot access my VMs.

@kj-creater
Copy link

I started a lima virtual machine with the following command, and logged in to the virtual machine background from video using root

limactl create --name=default template://docker \
--cpus=2 --memory=4 --vm-type=vz --mount-writable=true \
--disk=5 --network=lima:user-v2 --rosetta --video

limactl start

How can I confirm whether it is a problem with the virtual machine network or the m1 virtualization service?

I have encountered both of the following situations:

  1. When I run lima date -R in the terminal to freeze, I can confirm from the video that the virtual machine is still running and the CPU usage is not high;
  2. When I run lima date -R in the terminal to freeze, I can confirm from the video that the virtual machine has stopped and the Virtualization process takes up 200% of the CPU resources;

How can I help identify the problem in the above two situations?

lima version 0.18.0
macOS version 14.0 (23A344)

@kj-creater
Copy link

When I wrote the above the second scenario happened

  1. There is no error message in the ha.stderr.log file
  2. Virtualization process CPU usage is 200%
  3. video frozen
image

@terev
Copy link
Contributor

terev commented Nov 12, 2023

@balajiv113 Was able to catch the following in the network log when this occurs:

time="2023-11-12T19:22:25-05:00" level=info msg="new connection from  to "
2023/11/12 19:22:28 tcpproxy: for incoming conn 127.0.0.1:56720, error dialing "192.168.104.1:22": connect tcp 192.168.104.1:22: connection was refused
time="2023-11-12T19:22:44-05:00" level=error msg="r.CreateEndpoint() = connection was refused"

Unsure if this is relevant. The network process seems to remain alive.

@terev
Copy link
Contributor

terev commented Nov 13, 2023

I tried disabling rosetta but that did not help. Something interesting I noticed though is that after disabling rosetta, when the vm hangs, cpu is pinned at half the allocated cpu. Pinned at 100% when allocated 2 cpu. But when rosetta is enabled it's usually pinned at 200%.

@cdfmlr
Copy link
Contributor

cdfmlr commented Nov 14, 2023

After upgrading my M2 Mac mini to Sonoma, I've been encountering this issue frequently. Yesterday, I noticed that one of my Lima VM and an UTM VM (both utilizing the virtualization.framework) froze simultaneously.

The UTM VM works after killing and restarting it. However, the Lima VM fails to restart after a lima stop -f. When I use lima start, that VM encounters errors similar to issue #1915 (by what I remember from, the logs are lost). Recreating the VM solves the problem.

In addition, my Colima VM, also running on vz, has been experiencing frequent hangs as well. I can always resolve it by using the lima stop -f command and then restarting it.

@terev
Copy link
Contributor

terev commented Nov 14, 2023

I'm able to reproduce this issue almost every time when starting a large docker compose project (which I'm unable to share unfortunately). Today I noticed something new. I opened the system log utility to view any logs related to virtualization during one of these events. Doing so I was able to get some logs that seem interesting:

default	16:56:39.933077-0500	symptomsd	Received CPU usage trigger: 
  com.apple.Virtualization.Virtual[72861] () used 90.01s of CPU over 177.06 seconds (averaging 50%), violating a CPU usage limit of 90.00s over 180 seconds.
default	16:56:40.028006-0500	symptomsd	RESOURCE_NOTIFY trigger for com.apple.Virtualization.Virtual [72861] (90009971208 nanoseconds of CPU usage over 177.00s seconds, violating limit of 90000000000 nanoseconds of CPU usage over 180.00s seconds)
default	17:18:27.814709-0500	runningboardd	Periodic Run States <RBProcessState| identity:xpcservice<com.apple.Virtualization.VirtualMachine([anon<limactl>(502):72856])(502)>:72861 role:UserInteractive gpuRole:None explicitJetsamBand:0 memoryLimit:Inactive(Default) flags:60 guaranteedRunning:NO legacyFinishTaskReason:0 inheritances:<RBMutableInheritanceCollection| inheritancesByEnvironment:{
	
	}> primitiveAssertions:[
	<RBSProcessAssertionInfo| type:2 reason:20246 name:"Domain" domain:"com.apple.launchservicesd:RoleUserInteractive" expl:"uielement:72861">
	]>

These logs occur very close to when the the vm begins to hang. From my naive perspective this kind of seems like the os may be killing the virtualization process or severely throttling it for using too much cpu. Does that seem possible? I tried setting the vm's cpu limit to the number of cores my machine has but am still able to reproduce this. Side note: I'm strangely able to set the number of cpu to a number larger than my machine has.

The final log occurs some time after the vm begins to hang.

@n-io
Copy link

n-io commented Apr 19, 2024

I tired the above steps myself. Haven't got high cpu usage but the freeze happens.

On checking the GUI during the freeze even that was not responsive so i think the freeze happens on virtualization.framework level not on network.

I have also raised a support ticket with Apple with the same info.

Note: This happens to me on M1 only. My intel runs smooth for weeks with sleep and wake cases

I have the same issue with qemu. Running the same command will sometimes work and sometimes freeze the vm, requiring a stop --force, with CPU usage being somewhere around 400%. However, the 400% CPU usage occur on the qemu-system-x86_64 task. I'm on an M2 Mac and am using cpuType:\ x86_64: "max" in my config, using qemu v8.2.1.

You have already raised a ticket with Apple, but would it be possible to double-check and confirm if in your scenario the behaviour is reproducible using qemu instead of vz?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests