socket_vmnet failing on M1 (`start(): vmnet_return_t VMNET_FAILURE`)
jandubois opened this issue · 7 comments
I've now observed the error from lima-vm/lima#1049 two more times (qemu failing to start up because fd_connect
throws an error). Both times have been on an M1 mini; I cannot remember if the bug report on the lima repo was also based on a failure on M1, or if it was Intel.
Unfortunately I've been running with lima 0.12.0, which doesn't have the error reporting fix. However, I can see errors in the daemon logs (after qemu failed):
jan@zilicon _networks % cat rancher-desktop-shared_socket_vmnet.stderr.log
start(): vmnet_return_t VMNET_FAILURE
start: Undefined error: 0
jan@zilicon _networks % cat rancher-desktop-shared_socket_vmnet.stdout.log
Initializing vmnet.framework (mode 1001)
jan@zilicon _networks % cat rancher-desktop-bridged_en0_socket_vmnet.stderr.log
on_accept(): vmnet_return_t VMNET_INVALID_ARGUMENT
vmnet_write: Undefined error: 0
The bridged network was running, but the shared network was not.
The only way I found to get things working again was by rebooting the machine.
Does this error happen with vde_vmnet
too?
The vmnet code are almost unchanged from vde_vmnet
.
Does this error happen with
vde_vmnet
too?
It is possible, but I haven't seen it. One difference is that with socket_vmnet
the failure is catastrophic: qemu will not start the VM. With vde_vmnet you would just not get an IP address on the interface, so you might not notice unless you use the external IP address for ingress.
We have seen on Rancher Desktop that some users don't get an IP address in specific environments, but have never been able to determine the reason for it. Maybe it is related, but I don't know. We detect this and configure flannel with the SLIRP interface when that happens, so things are still working with reduced functionality in that case.
It is possible, but I haven't seen it.
All the failures I've seen last week were on a remote M1 mini that is running inside the Vancouver office, so it is a different environment from what I regularly use. However, the failures were not immediate, or frequent, but just once a day after restarting VMs (and daemons) multiple times. The machine was running Big Sur, whereas my regular Intel machine is running Catalina.
i have also noticed this, changing my location and (different wifi) have caused problems that I was able to fix only by uninstalling and rebooting and installing.
A bit more information is I can confirm my DHCP 'server' is allocating the DHCP address to socket_vmnet
as I receive a 'new device detected' alert from my firewall.
Tailing the stderr shows the same errors as reported by @jandubois.
on_accept(): vmnet_return_t VMNET_INVALID_ARGUMENT
vmnet_write: Undefined error: 0
on_accept(): vmnet_return_t VMNET_INVALID_ARGUMENT
vmnet_write: Undefined error: 0
on_accept(): vmnet_return_t VMNET_INVALID_ARGUMENT
vmnet_write: Undefined error: 0
on_accept(): vmnet_return_t VMNET_INVALID_ARGUMENT
vmnet_write: Undefined error: 0
on_accept(): vmnet_return_t VMNET_INVALID_ARGUMENT
vmnet_write: Undefined error: 0
I'm running macOS Ventura 13.1 (22C65).
I ran into a similar issue. With socket mode instead of shared mode because the socket_vmnet is "unmanaged" meaning it's started or stopped by brew services
. First time starting VMs for the day worked fine. After a couple of minutes, the VM network went into unreachable state. Was not able to start the VM after it's stopped.
ha.stderr.log
{"level":"debug","msg":"QEMU version 8.0.2 detected","time":"2023-07-18T13:28:12-04:00"}
{"level":"debug","msg":"firmware candidates = [/Users/jylee/.local/share/qemu/edk2-aarch64-code.fd /opt/homebrew/share/qemu/edk2-aarch64-code.fd /usr/share/AAVMF/AAVMF_CODE.fd /usr/share/qemu-efi-aarch64/QEMU_EFI.fd]","time":"2023-07-18T13:28:12-04:00"}
{"level":"fatal","msg":"template: :1:21: executing \"\" at \u003cfd_connect \"/opt/homebrew/var/run/socket_vmnet\"\u003e: error calling fd_connect: fd_connect: dial unix /opt/homebrew/var/run/socket_vmnet: connect: connection refused","time":"2023-07-18T13:28:12-04:00"}
The socket_vmnet service itself shows
% sudo brew services list
Name Status User File
socket_vmnet error 256 root /Library/LaunchDaemons/homebrew.mxcl.socket_vmnet.plist
unbound none
and /opt/homebrew/var/log/socket_vmnet/stderr
shows some iterations of these logs
vmnet_write: Bad file descriptor
writev: Bad file descriptor
writev: Broken pipe
writev: Broken pipe
writev: Broken pipe
writev: Broken pipe
writev: Broken pipe
writev: Broken pipe
writev: Broken pipe
writev: Broken pipe
writev: Broken pipe
on_accept(): vmnet_return_t VMNET_INVALID_ARGUMENT
vmnet_write: Broken pipe
To restore the network, I had to restart the socket_vmnet service and all the VMs. After a while, this problem repeats. Is there any other workaround to this?
By the way, this doesn't just happen on "socket" mode in case you're wondering. It happened on "shared" mode where socket_vmnet is managed by lima.
I have M1 Macbook Pro on MacOS Ventura 13.4.1. socket_vmnet 1.1.2.
I'm seeing the same behaviour on Mac OS X 13.4.1(c) M2 - socket_vmnet 1.1.2 - lima 0.16.0
I can build/start a new VM, soon as I stop it, I see the same behaviour as described here with the same stderr outputs. The difference here is that I don't see an error on the service, as it's not running I can't restart it.
sudo brew services
Name Status User File
socket_vmnet none
Only fix I've found so far, is to uninstall socket_vmnet and reinstall it.