kata-containers/osbuilder

image builder: kata-containers-clearlinux-*-agent-*.img to short

pohly opened this issue · 12 comments

pohly commented

Description of problem

I tried to use the generated files (specifically: http://jenkins.katacontainers.io/job/image-nightly-x86_64/lastSuccessfulBuild/artifact/artifacts/kata-containers-clearlinux-31890-osbuilder-dbbf160-agent-6182fa3.img.tar.xz) as nvdimm device and then mount its pmem0p1 inside QEMU.

Expected result

Mount fails:

mount: /mnt: wrong fs type, bad option, bad superblock on /dev/pmem0p1, missing codepage or helper program, or other error.
...
[    8.183347] EXT4-fs (pmem0p1): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
[    8.184236] EXT4-fs (pmem0p1): bad geometry: block count 31744 exceeds size of device (31488 blocks)

Actual result

Should work.

Steps to reproduce:

pohly commented

It also fails with Clear Linux inside the QEMU guest.

pohly commented

Making the agent file larger by 256 4K blocks fixes the problem:

truncate -s $(((31744 - 31488) * 4096 + $(stat -c %s kata-containers-clearlinux-31890-osbuilder-dbbf160-agent-6182fa3.img))) kata-containers-clearlinux-31890-osbuilder-dbbf160-agent-6182fa3.img
pohly commented

I previously discussed this with @devimc as part of kata-containers/runtime#2262; he wanted to look into it, but as I now have a script to reproduce it, I thought that filing a proper bug report might help.

in x86 memory pages must be aligned to 128MB otherwise memory hotplug will not work (acpi PNP0C80:01: Enumeration failure), so incrementing the image size by 4K - 1M will break memory hotplug in kata, the only solution that I can see is to change the image to the next minimum required size that is 256MB but this can increment memory footprint per container (something that I think we don't want 😄 )
@pohly not sure if the feature/fix that you're raising here is valid since kata is working with the current image and I think you want to use it for something different.

pohly commented

But why does it work for Kata Containers - is the file really too short, as the kernel says?

I'll reimplement this anyway (license, programming language, etc.) and then can simply add padding to make it work for my usecases, so it's up to you whether you want to investigate this further or close it.

@pohly it works because the whole image (including MBRs and DAX metadata) is consider as an nvdimm device whose size is 128M, when you use this image as a NVDIMM device, the first MB (where the first MBR and DAX metadata are) is not considered, so the actual size of your nvdimm device is 127MB which is misaligned (bad geometry).
I think you can easily workaround this issue by adding this extra MB in your command line [1] or to VM_FILE_SIZE

[1] - pohly/pmem-CSI@baa2759#diff-f446327a9405611f42562b66e3af367cR34

pohly commented

@devimc so the file is okay, just the QEMU invocation is wrong?

the first MB (where the first MBR and DAX metadata are)

You mean 2MB, right? dax_header_sz in the script.

pohly commented

I've implemented image creation in PMEM-CSI, including a test which boots with the generated file. When I add a hole of 2MB at the end (similar to the truncate command above, but exactly 2MB), then the partition is indeed usable.

@pohly

You mean 2MB, right? dax_header_sz in the script.

sorry, yes I meant 2MB

I've implemented image creation in PMEM-CSI, including a test which boots with the generated file. When I add a hole of 2MB at the end (similar to the truncate command above, but exactly 2MB), then the partition is indeed usable.

so, truncate the image is not longer needed?

pohly commented

so, truncate the image is not longer needed?

That depends on how the image is used. I find it cleaner when the nominal file size of the image file is such that this file size can be used as parameter for the nvdimm object size. That's what I am doing in PMEM-CSI.

But if Kata Containers solves this by bumping up the parameter by those additional 2MB, then that also works.

@pohly good to hear you found a solution, I'm going to close this issue, feel free to re-open it if you want