image builder: kata-containers-clearlinux-*-agent-*.img to short
pohly opened this issue · 12 comments
Description of problem
I tried to use the generated files (specifically: http://jenkins.katacontainers.io/job/image-nightly-x86_64/lastSuccessfulBuild/artifact/artifacts/kata-containers-clearlinux-31890-osbuilder-dbbf160-agent-6182fa3.img.tar.xz) as nvdimm device and then mount its pmem0p1 inside QEMU.
Expected result
Mount fails:
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/pmem0p1, missing codepage or helper program, or other error.
...
[ 8.183347] EXT4-fs (pmem0p1): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
[ 8.184236] EXT4-fs (pmem0p1): bad geometry: block count 31744 exceeds size of device (31488 blocks)
Actual result
Should work.
Steps to reproduce:
- download and unpack http://jenkins.katacontainers.io/job/image-nightly-x86_64/lastSuccessfulBuild/artifact/artifacts/kata-containers-clearlinux-31890-osbuilder-dbbf160-agent-6182fa3.img.tar.xz
- check out pohly/pmem-CSI@baa2759
- run a cluster (only its artifacts are needed, not the cluster itself):
TEST_DISTRO=fedora make start
PATH=_work/bin/:$PATH EXISTING_VM_FILE=kata-containers-clearlinux-31890-osbuilder-dbbf160-agent-6182fa3.img VM_IMAGE=_work/resources/Fedora-Cloud-Base-30-1.2.x86_64.raw ./test/run-vm-test.sh
It also fails with Clear Linux inside the QEMU guest.
Making the agent file larger by 256 4K blocks fixes the problem:
truncate -s $(((31744 - 31488) * 4096 + $(stat -c %s kata-containers-clearlinux-31890-osbuilder-dbbf160-agent-6182fa3.img))) kata-containers-clearlinux-31890-osbuilder-dbbf160-agent-6182fa3.img
/cc @jcvenegas @chavafg
I previously discussed this with @devimc as part of kata-containers/runtime#2262; he wanted to look into it, but as I now have a script to reproduce it, I thought that filing a proper bug report might help.
in x86 memory pages must be aligned to 128MB otherwise memory hotplug will not work (acpi PNP0C80:01: Enumeration failure
), so incrementing the image size by 4K - 1M will break memory hotplug in kata, the only solution that I can see is to change the image to the next minimum required size that is 256MB but this can increment memory footprint per container (something that I think we don't want 😄 )
@pohly not sure if the feature/fix that you're raising here is valid since kata is working with the current image and I think you want to use it for something different.
But why does it work for Kata Containers - is the file really too short, as the kernel says?
I'll reimplement this anyway (license, programming language, etc.) and then can simply add padding to make it work for my usecases, so it's up to you whether you want to investigate this further or close it.
@pohly it works because the whole image (including MBRs and DAX metadata) is consider as an nvdimm device whose size is 128M, when you use this image as a NVDIMM device, the first MB (where the first MBR and DAX metadata are) is not considered, so the actual size of your nvdimm device is 127MB which is misaligned (bad geometry).
I think you can easily workaround this issue by adding this extra MB in your command line [1] or to VM_FILE_SIZE
[1] - pohly/pmem-CSI@baa2759#diff-f446327a9405611f42562b66e3af367cR34
@devimc so the file is okay, just the QEMU invocation is wrong?
the first MB (where the first MBR and DAX metadata are)
You mean 2MB, right? dax_header_sz
in the script.
I've implemented image creation in PMEM-CSI, including a test which boots with the generated file. When I add a hole of 2MB at the end (similar to the truncate command above, but exactly 2MB), then the partition is indeed usable.
You mean 2MB, right? dax_header_sz in the script.
sorry, yes I meant 2MB
I've implemented image creation in PMEM-CSI, including a test which boots with the generated file. When I add a hole of 2MB at the end (similar to the truncate command above, but exactly 2MB), then the partition is indeed usable.
so, truncate the image is not longer needed?
so, truncate the image is not longer needed?
That depends on how the image is used. I find it cleaner when the nominal file size of the image file is such that this file size can be used as parameter for the nvdimm object size. That's what I am doing in PMEM-CSI.
But if Kata Containers solves this by bumping up the parameter by those additional 2MB, then that also works.