Can't resume vm that runs docker in it
BasToTheMax opened this issue · 3 comments
Hello.
I am having issues while trying to resume my vm.
What I am doing:
- Create vmA
- Make a Full snapshot of vmA
- Destroy vmA
- Start vmB by loading the snapshot (diff's enabled)
- Create a Diff snapshot
- Destroy vmB
- Merge the Full and the Diff snapshot. (command below)
- Start vmC by loading the newly created (merged) snapshot.
- Firecracker crashes with the error below:
Please note: I am running docker in my vm
Snapshot command:
./snapshot-editor edit-memory rebase \
--memory-path ./snap/mem1 \
--diff-path ./snap/mem2
Firecracker logs:
2023-12-25T17:24:28.369578016 [anonymous-instance:fc_api:INFO:src/api_server/src/parsed_request.rs:163] The request was executed successfully. Status code: 204 No Content.
2023-12-25T17:24:28.387220678 [anonymous-instance:fc_api:INFO:src/api_server/src/parsed_request.rs:70] The API server received a Put request on "/snapshot/load" with body "{\n \"snapshot_path\": \"./snap/snap1\",\n \"mem_file_path\": \"./snap/mem1\",\n \"enable_diff_snapshots\": true,\n \"resume_vm\": true\n }".
2023-12-25T17:24:28.387596935 [anonymous-instance:main:WARN:src/vmm/src/logger/mod.rs:33] [DevPreview] Virtual machine snapshots is in development preview.
2023-12-25T17:24:28.387873552 [anonymous-instance:main:INFO:src/vmm/src/persist.rs:314] Host CPU vendor ID: [71, 101, 110, 117, 105, 110, 101, 73, 110, 116, 101, 108]
2023-12-25T17:24:28.387891803 [anonymous-instance:main:INFO:src/vmm/src/persist.rs:315] Snapshot CPU vendor ID: [71, 101, 110, 117, 105, 110, 101, 73, 110, 116, 101, 108]
2023-12-25T17:24:28.413620267 [anonymous-instance:main:ERROR:src/vmm/src/devices/virtio/queue.rs:296] virtio queue number of available descriptors 4097 is greater than queue max size 256
2023-12-25T17:24:28.413716156 [anonymous-instance:main:INFO:src/vmm/src/lib.rs:818] Vmm is stopping.
2023-12-25T17:24:28.481140691 [anonymous-instance:fc_api:ERROR:src/api_server/src/parsed_request.rs:190] Received Error. Status code: 400 Bad Request. Message: Load snapshot error: Failed to restore from snapshot: Failed to build microVM from snapshot: Failed to restore MMIO device: Cannot restore devices: VirtioBlock(Persist(InvalidInput))
2023-12-25T17:24:28.481173674 [anonymous-instance:fc_api:WARN:src/api_server/src/lib.rs:139] PUT /snapshot/load: mem_file_path field is deprecated.
2023-12-25T17:24:28.481367990 [anonymous-instance:main:ERROR:src/firecracker/src/main.rs:94] RunWithApiError error: Failed to build MicroVM: Loading snapshot failed..
2023-12-25T17:24:28.481410903 [anonymous-instance:main:ERROR:src/firecracker/src/main.rs:97] Firecracker exiting with error. exit_code=1
Host kernel: Linux bttm 6.2.0-39-generic #40~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 16 10:53:04 UTC 2 x86_64 x86_64 x86_64 GNU/Linux (uname -a)
Guest kernel: vmlinux-5.10.bin
Download script used:
# Some var
ARCH="$(uname -m)"
release_url="https://github.com/firecracker-microvm/firecracker/releases"
latest=$(basename $(curl -fsSLI -o /dev/null -w %{url_effective} ${release_url}/latest))
curl -L ${release_url}/download/${latest}/firecracker-${latest}-${ARCH}.tgz \
| tar -xz
mv release-${latest}-$(uname -m)/firecracker-${latest}-${ARCH} firecracker
mv release-${latest}-$(uname -m)/snapshot-editor-${latest}-${ARCH} snapshot-editor
rm release-${latest}-$(uname -m) -r
wget https://s3.amazonaws.com/spec.ccfc.min/img/quickstart_guide/${ARCH}/kernels/vmlinux-5.10.bin
mv vmlinux-5.10 kernel
chmod +x ./firecracker
chmod +x ./snapshot-editorTo give more context:
- The guest is running debian 12
- The rootfs is build using docker
- The guest is running docker (so docker in the microvm)
- Docker in the guest runs a container (a minecraft server, to be exact)
If you need more details, feel free to ask 😉.
I hope someone can help me fix the issue. I will probably also ask in the slack server.
Originally posted by @BasToTheMax in #2888 (comment)
I'm currently on vacation and won't be able to do tests.
Hi @BasToTheMax ! Thanks for reporting the issue.
From our initial analysis, it looks like the block device fails to restore, because the device layout in memory is not correct.
Could you provide a reproducible test that demonstrates the issue including the following if possible:
- (a link to) the rootfs that is used
- which API calls (or json config) is used to configure and boot the VM
- actions that are performed inside the VM before taking snapshots
Alternatively, we have a test that exercises differential snapshots:
. You could modify it in the way it is closer to your setup and see if it starts failing (testing readme).Additionally, is running a docker inside the VM a principal part of the reproduction steps? Does the same sequence not fail without a docker inside?
Hi @BasToTheMax were you able to solve your issue? If not can you provide a series of commands as mentioned in @kalyazin's comment?