mirage/qubes-mirage-firewall

Appvm start failed

Opened this issue · 11 comments

When start qube with netvm set to mirage-firewall, this error message showed up:
Start failed: internel error: libxenlight failed to create new domain 'test', see /var/log/libvirt/libxl/libxl-driver.log for details

/var/log/libvirt/libxl/libxl-driver.log:

2019-05-17 08:45:01.432+0000: libxl: libxl_device.c:1093:device_backend_callback: unable to add device with path /local/domain/22/backend/vif/24/0
2019-05-17 08:45:01.432+0000: libxl: libxl_create.c:1512:domcreate_attach_devices: unable to add nic devices
2019-05-17 08:45:11.461+0000: libxl: libxl_device.c:1093:device_backend_callback: unable to remove device with path /local/domain/22/backend/vif/24/0
2019-05-17 08:45:11.466+0000: libxl: libxl.c:1669:devices_destroy_cb: libxl__devices_destroy failed for 24

Could you also post the firewall log output generated while the VM is starting.

The logs are stored in dom0 at /var/log/xen/console/guest-mirage-firewall.log and are also available through Qubes Manager; right-click over mirage-firewall and use the Logs submenu.

/var/log/xen/console/guest-mirage-firewall.log:

.[32;1mMirageOS booting....[0m
Initialising timer interface
Initialising console ... done.
gnttab_stubs.c: initialised mini-os gntmap
2019-05-17 08:40:00 -00:00: INF [qubes.rexec] waiting for client...
2019-05-17 08:40:00 -00:00: INF [qubes.gui] waiting for client...
2019-05-17 08:40:00 -00:00: INF [qubes.db] connecting to server...
2019-05-17 08:40:00 -00:00: INF [qubes.db] connected
2019-05-17 08:40:00 -00:00: INF [qubes.rexec] client connected, using protocol version 2
2019-05-17 08:40:00 -00:00: INF [unikernel] QubesDB and qrexec agents connected in 0.020 s
2019-05-17 08:40:00 -00:00: INF [dao] Got network configuration from QubesDB:
            NetVM IP on uplink network: 10.138.6.204
            Our IP on uplink network:   10.137.0.28
            Our IP on client networks:  10.137.0.28
2019-05-17 08:40:00 -00:00: INF [net-xen:frontend] connect 0
2019-05-17 08:40:00 -00:00: INF [memory_pressure] Writing meminfo: free 16776 / 29256 kB (57.34 %)
2019-05-17 08:40:00 -00:00: INF [net-xen:frontend] create: id=0 domid=3
2019-05-17 08:40:00 -00:00: INF [net-xen:frontend]  sg:true gso_tcpv4:true rx_copy:true rx_flip:false smart_poll:false
2019-05-17 08:40:00 -00:00: INF [net-xen:frontend] MAC: 00:16:3e:5e:6c:00
2019-05-17 08:40:00 -00:00: INF [qubes.db] got update: "/qubes-keyboard" = "xkb_keymap {\n\txkb_keycodes  { include \"evdev+aliases(qwerty)\"\t};\n\txkb_types     { include \"complete\"\t};\n\txkb_compat    { include \"complete\"\t};\n\txkb_symbols   { include \"pc+us+inet(evdev)\"\t};\n\txkb_geometry  { include \"pc(pc105)\"\t};\n};"
2019-05-17 08:40:00 -00:00: INF [qubes.gui] client connected (screen size: 1920x1080 depth: 24 mem: 8101x)
2019-05-17 08:40:00 -00:00: INF [unikernel] GUI agent connected
2019-05-17 08:40:00 -00:00: WRN [command] << Unknown command "QUBESRPC qubes.SetMonitorLayout dom0"
port 4 still bound!
port 5 still bound!
port 6 still bound!
port 7 still bound!

Was the firewall still running? Maybe it crashed? I'd expect to see logging information about the new domain here, and still bound messages normally only appear at shutdown. It's odd that it didn't say why it was shutting down, though.

You can run xl list in dom0 to check if it's running. If it is, do sudo xl console mirage-firewall and then try starting the new qube again and see what gets printed.

Firewall is running and all settings are configured correctly. I tried to re create mirage-firewall but still got that error message. If I change the netvm to sys-firewall or none, 'test' domain start and work fine.

This this the output of sudo xl console mirage-firewall

.[32;1mMirageOS booting....[0m
Initialising timer interface
Initialising console ... done.
gnttab_stubs.c: initialised mini-os gntmap
2019-05-18 10:00:41 -00:00: INF [qubes.rexec] waiting for client...
2019-05-18 10:00:41 -00:00: INF [qubes.gui] waiting for client...
2019-05-18 10:00:41 -00:00: INF [qubes.db] connecting to server...
2019-05-18 10:00:41 -00:00: INF [qubes.db] connected
2019-05-18 10:00:41 -00:00: INF [qubes.rexec] client connected, using protocol version 2
2019-05-18 10:00:41 -00:00: INF [unikernel] QubesDB and qrexec agents connected in 0.027 s
2019-05-18 10:00:41 -00:00: INF [dao] Got network configuration from QubesDB:
            NetVM IP on uplink network: 10.138.6.204
            Our IP on uplink network:   10.137.0.28
            Our IP on client networks:  10.137.0.28
2019-05-18 10:00:41 -00:00: INF [net-xen:frontend] connect 0
2019-05-18 10:00:41 -00:00: INF [memory_pressure] Writing meminfo: free 16776 / 29256 kB (57.34 %)
2019-05-18 10:00:41 -00:00: INF [net-xen:frontend] create: id=0 domid=3
2019-05-18 10:00:41 -00:00: INF [net-xen:frontend]  sg:true gso_tcpv4:true rx_copy:true rx_flip:false smart_poll:false
2019-05-18 10:00:41 -00:00: INF [net-xen:frontend] MAC: 00:16:3e:5e:6c:00
2019-05-18 10:00:42 -00:00: INF [qubes.db] got update: "/qubes-keyboard" = "xkb_keymap {\n\txkb_keycodes  { include \"evdev+aliases(qwerty)\"\t};\n\txkb_types     { include \"complete\"\t};\n\txkb_compat    { include \"complete\"\t};\n\txkb_symbols   { include \"pc+us+inet(evdev)\"\t};\n\txkb_geometry  { include \"pc(pc105)\"\t};\n};"
2019-05-18 10:00:42 -00:00: INF [qubes.gui] client connected (screen size: 1920x1080 depth: 24 mem: 8101x)
2019-05-18 10:00:42 -00:00: INF [unikernel] GUI agent connected
2019-05-18 10:00:42 -00:00: WRN [command] << Unknown command "QUBESRPC qubes.SetMonitorLayout dom0"

OK, have a look at the xl list output to get the domain ID of mirage-firewall. Then look at the XenStore tree under /local/domain/ID/backend/vif, where ID is the firewall's domain ID. Do this while Qubes is still trying to start the client qube.

e.g. on my machine I get:

[tal@dom0 ~]$ xl list
Name                                        ID   Mem VCPUs	State	Time(s)
Domain-0                                     0  4077     2     r-----     105.3
sys-net                                      1   384     2     -b----      14.8
sys-net-dm                                   2   144     1     -b----      10.4
mirage-firewall                              3    60     1     -b----       0.5
com                                          4  3926     2     -b----      69.1
dev                                          5  5240     4     -b----      47.9

[tal@dom0 ~]$ xenstore-ls /local/domain/3/backend/vif
4 = ""
 0 = ""
  frontend = "/local/domain/4/device/vif/0"
  frontend-id = "4"
  online = "1"
  state = "4"
  script = "/etc/xen/scripts/vif-route-qubes"
  mac = "00:16:3e:5e:6c:00"
  ip = "10.137.0.15"
  bridge = "xenbr0"
  handle = "0"
  type = "vif"
  feature-sg = "1"
  feature-gso-tcpv4 = "0"
  feature-rx-copy = "1"
  feature-rx-flip = "0"
  feature-rx-notify = "1"
  feature-smart-poll = "0"
5 = ""
 0 = ""
  frontend = "/local/domain/5/device/vif/0"
  frontend-id = "5"
  online = "1"
  state = "4"
  script = "/etc/xen/scripts/vif-route-qubes"
  mac = "00:16:3e:5e:6c:00"
  ip = "10.137.0.12"
  bridge = "xenbr0"
  handle = "0"
  type = "vif"
  feature-sg = "1"
  feature-gso-tcpv4 = "0"
  feature-rx-copy = "1"
  feature-rx-flip = "0"
  feature-rx-notify = "1"
  feature-smart-poll = "0"

There should be one entry for each client domain. In my case, I have entries for domains 4 (com) and 5 (dev). Qubes creates these entries when you start a client VM, and the firewall should notice when they appear and log something (which I don't see in your log output above).

Also, do qvm-prefs $CLIENT to check the setting. e.g. I see:

[tal@dom0 ~]$ qvm-prefs com netvm
mirage-firewall

Actually, I think it's just waiting to connect to sys-net. We should see something like this in the logs even before trying to start a client:

2019-05-16 18:30:47 -00:00: INF [dao] Got network configuration from QubesDB:
            NetVM IP on uplink network: 10.137.0.5
            Our IP on uplink network:   10.137.0.19
            Our IP on client networks:  10.137.0.19
2019-05-16 18:30:47 -00:00: INF [net-xen:frontend] connect 0
2019-05-16 18:30:47 -00:00: INF [memory_pressure] Writing meminfo: free 45280 / 57816 kB (78.32 %)
2019-05-16 18:30:47 -00:00: INF [net-xen:frontend] create: id=0 domid=1
2019-05-16 18:30:47 -00:00: INF [net-xen:frontend]  sg:true gso_tcpv4:true rx_copy:true rx_flip:false smart_poll:false
2019-05-16 18:30:47 -00:00: INF [net-xen:frontend] MAC: 00:16:3e:5e:6c:00
2019-05-16 18:30:47 -00:00: INF [ethernet] Connected Ethernet interface 00:16:3e:5e:6c:00
2019-05-16 18:30:47 -00:00: INF [ARP] Sending gratuitous ARP for 10.137.0.19 (00:16:3e:5e:6c:00)
2019-05-16 18:30:47 -00:00: INF [dao] Watching backend/vif
[Janitha@dom0 ~]$ xenstore-ls /local/domain/18/backend/vif
19 = ""
 0 = ""
  frontend = "/local/domain/19/device/vif/0"
  frontend-id = "19"
  online = "0"
  state = "5"
  script = "/etc/xen/scripts/vif-route-qubes"
  mac = "00:16:3e:5e:6c:00"
  ip = "10.137.0.29"
  bridge = "xenbr0"
  handle = "0"
  type = "vif"

To check what's happening with sys-net, look in XenStore for the firewall's uplink device, like this:

  1. Get the firewall's domain ID with xl list:
[tal@dom0 ~]$ xl list
Name                                        ID   Mem VCPUs	State	Time(s)
Domain-0                                     0  4072     2     r-----     211.6
sys-net                                      1   384     2     -b----      26.0
sys-net-dm                                   2   144     1     -b----      26.7
mirage-firewall                              3    60     1     -b----       0.9
com                                          4  3926     2     -b----     103.0
dev                                          5  5629     4     -b----     272.6

Get the frontend device's status for that domain (3 in my case):

[tal@dom0 ~]$ xenstore-ls /local/domain/3/device/vif
0 = ""
 backend = "/local/domain/1/backend/vif/3/0"
 backend-id = "1"
 state = "4"
 handle = "0"
 mac = "00:16:3e:5e:6c:00"
 tx-ring-ref = "11"
 rx-ring-ref = "10"
 event-channel = "7"
 feature-sg = "1"
 feature-gso-tcpv4 = "0"
 request-rx-copy = "1"
 feature-rx-flip = "0"
 feature-rx-notify = "1"
 feature-smart-poll = "0"

Check the backend state (the path is printed in the output above as backend = "/local/domain/1/backend/vif/3/0", which says that this virtual device is provided by domain 1 (sys-net) in my case).

[tal@dom0 ~]$ xenstore-ls /local/domain/1/backend/vif/3/0
frontend = "/local/domain/3/device/vif/0"
frontend-id = "3"
online = "1"
state = "4"
script = "/etc/xen/scripts/vif-route-qubes"
mac = "00:16:3e:5e:6c:00"
ip = "10.137.0.19"
bridge = "xenbr0"
handle = "0"
type = "vif"
feature-sg = "1"
feature-gso-tcpv4 = "1"
feature-gso-tcpv6 = "1"
feature-ipv6-csum-offload = "1"
feature-rx-copy = "1"
feature-rx-flip = "0"
feature-multicast-control = "1"
feature-dynamic-multicast-control = "1"
feature-split-event-channels = "1"
multi-queue-max-queues = "2"
feature-ctrl-ring = "1"
hotplug-status = "connected"

The front-end and back-end should both have state = 4 (connected).

I found the issue. I create sys-net with fedora-29-minimal and domain not start when mirage-firewall use it. when I change the sys-net to the fedora-29, domain start and work fine.

Janitha@dom0 ~]$ xl list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0  4060     4     r-----     177.7
disp-sys-usb                                 1   284     2     -b----      16.7
disp-sys-usb-dm                              2   144     1     -b----      20.8
disp-sys-net                                 3   384     2     -b----      24.6
disp-sys-net-dm                              4   144     1     -b----      10.3
disp-sys-firewall                            5   583     2     -b----      19.0
personal                                     6  2983     2     -b----      75.6
mirage-firewall                             10    32     1     -b----       0.0


[Janitha@dom0 ~]$ xenstore-ls /local/domain/10/device/vif
0 = ""
 backend = "/local/domain/3/backend/vif/10/0"
 backend-id = "3"
 state = "4"
 handle = "0"
 mac = "00:16:3e:5e:6c:00"
 tx-ring-ref = "11"
 rx-ring-ref = "10"
 event-channel = "7"
 feature-sg = "1"
 feature-gso-tcpv4 = "0"
 request-rx-copy = "1"
 feature-rx-flip = "0"
 feature-rx-notify = "1"
 feature-smart-poll = "0"

[Janitha@dom0 ~]$ xenstore-ls /local/domain/3/backend/vif/10/0
frontend = "/local/domain/10/device/vif/0"
frontend-id = "10"
online = "1"
state = "2"
script = "/etc/xen/scripts/vif-route-qubes"
mac = "00:16:3e:5e:6c:00"
ip = "10.137.0.28"
bridge = "xenbr0"
handle = "0"
type = "vif"
feature-sg = "1"
feature-gso-tcpv4 = "1"
feature-gso-tcpv6 = "1"
feature-ipv6-csum-offload = "1"
feature-rx-copy = "1"
feature-rx-flip = "0"
feature-multicast-control = "1"
feature-dynamic-multicast-control = "1"
feature-split-event-channels = "1"
multi-queue-max-queues = "2"
feature-ctrl-ring = "1"
hotplug-error = "/etc/xen/scripts/vif-route-qubes failed; error detected."
hotplug-status = "error

OK, great! I guess the fedora-minimal sys-net must have crashed soon after booting or something, as Qubes wouldn't start the firewall until it thought sys-net was up.