kata-containers/runtime

split qemu support in stable and experimental

jcvenegas opened this issue · 10 comments

Which feature do you think can be improved?

Update: After some talk on this issue: No more runtime configs will be created for qemu-virtiofs experimental, instead some job that modify the configuration toml that to point to modify anything we need to test.

Today virtiofs config uses cache=auto and no dax, both are supported in stable kernel releases and qemu.

The config could change in the future looking for the DAX bennefits, but because more testing and stablization is needed there, specially kata keep testing and give feedback to the virtiofs project. It requiered to split the testing in two:

  • Move kata-qemu-virtiofs to qemu-vanilla(the one that we track in versions.yaml) and to our lts kernel.

Keep testing dev branch updates from virtiofs

  • Create a job that qemu-viritiofs modifies the default qemu-viritofs-config to
    • qemu-vanilla + virtiofs newer patches
    • Any kernel recommended by virtiofs
    • DAX
    • Unless wanted by users there is not runtimeclass or kata-runtime wrapper for this config.

Additional Information

Anything else to add?

Before raising this feature request

Have you looked at the limitations document?

@jodh-intel @jcvenegas ,

I always understood QEMU "experimental" as synonymous of qemu-virtiofs. Also I don't know how to call the non-experimental QEMU so usually I say QEMU "standard". Now with this change we have the experimental of QEMU experimental ("qemu-viritiofs experimental" and, using my nomenclature standard of QEMU experimental. Don't you think this is too confusing?

I know that giving name is hard, I am not good at that at all. But let me give a few suggestions:

  • experimental stable vs unstable
  • experimental stable vs master (often used in Git, but "master" nowadays is avoided)
  • experimental stable vs next (I like that nomenclature. In the end it seems what we try to achieve is the "next" stable version of QEMU experimental)

Disregard if I misunderstood this completely. :)

@wainersm

I always understood QEMU "experimental" as synonymous of qemu-virtiofs.

Yes, experimental all this time has been just for virtiofs usecase, as part of this CI refactor I was looking to have a more meaningful name.

  • It points to virtiofs out-of-tree dev repository from virtiofs dev team.
  • This help us to easily use the tests new patches that virtiofs team recommend.

Also I don't know how to call the non-experimental QEMU so usually I say QEMU "standard"

That is fine, I think we usually name it qemu-vanilla (even if it has one or two small patches ;) )

Now with this change we have the experimental of QEMU experimental ("qemu-viritiofs experimental" and, using my nomenclature standard of QEMU experimental. Don't you think this is too confusing?

Yeah, it is confusing, thanks for comment about it I was looking to make more clear that that build is just for virtiofs experimental features, (perf or functional)

experimental stable vs next (I like that nomenclature. In the end it seems what we try to achieve is the "next" stable version of QEMU experimental)

If next/experimetal is not confusing about that today it is for virtiofs new features (not even in master \cc @dagrh, correct ? ),I think is fine to keep the name generic then.

One more comment, I don't follow what experimental stable means culd you elaborate a bit more.

One more comment, I don't follow what experimental stable means culd you elaborate a bit more.

With experimental stable I meant your virtiofs stable.

IUUC those are the current variables used to control build and install of QEMU + Kernel on CI scripts:

experimental_qemu:

  • True (qemu-vanilla + virtiofs 'stable')
  • False (qemu-vanila + 9p)

experimental_kernel:

  • True (use with experimental_qemu=True)
  • False (likewise)

Now we want to express the differentiation of virtiofs experimental (e.g. DAX) vs stable (e.g. released virtiofs).

Then you are creating a new configuration-qemu-virtiofs-experimental.toml which should point to new built artifacts (virtiofsd-experimental, qemu-virtiofs-system-x86_64 , etc).

Are you going to use another variable on the CI scripts to recognize that scenario? (e.g. experimental_virtiofs=True) Ending in something like:

  • experimental_qemu=True && experimental_virtiofs=True && experimental_kernel=True - qemu-vanilla + virtiofs 'experimental' + kernel 'experimental'

  • experimental_qemu=True && experimental_virtiofs=False && experimental_kernel=True- qemu-vanilla + virtiofs 'stable' + kernel 'stable (?)'

  • experimental_qemu=False && experimental_kernel=False- qemu-vanilla + 9p + kernel 'lts (?)'

Another doubt that I have is: what if in the future virtiofs provides 2 experimental features and we want to test them separated?

Again, take my words as a mere brainstorming :)

@wainersm Yes, that is how I wanted that new experimental would be added.

So based in your 3 items, I wanted to remove the generic name experimental_qemu to experimental_qemu_virtiofs, this is because I don't want to lead to a confusion of what it means that build (only for virtiofs experimentation).

  • KATA_HYPERVISOR=qemu-virtiofs-experimental && experimental_virtiofs_qemu=True && experimental_virtiofs_kernel=True

    qemu: qemu-from-virtiofs-fork
    kernel: kernel from virtiofs-fork
    runtime config: kata-qemu-virtiofs-experimental.toml with virtiofs as shared-fs backend

  • KATA_HYPERVISOR=qemu-virtiofs && experimental_virtiofs_qemu=False && experimental_virtiofs_kernel=False

    qemu: qemu-vanilla
    kernel: kernel lts
    runtime config: kata-qemu.toml with virtiofs as shared-fs backend

  • KATA_HYPERVISOR=qemu && experimental_virtiofs_qemu=False && experimental_virtiofs_kernel=False

    qemu: qemu-vanilla
    kernel: kernel lts
    runtime config: kata-qemu.toml with 9pfs as shared-fs backend

Another doubt that I have is: what if in the future virtiofs provides 2 experimental features and we want to test them separated?

I think if we have more features to test we can keep testing together. Unless the features are mutually exclusive, but what we are looking its to try to experiment for features that would go default (when they are merged and released in an official qemu and we move to it).

If renaming and adding a new config is confusing and a overload of work, I think I can just move kata-qemu-virtiofs to use default qemu and kernel. And create set of CI Jobs that modify that toml. And just keep naming the builds qemu-experimental and kernel-experimental anyway I think there is not other experimental things we are looking today. @wainersm what do you think?
@amshinde @bergwolf @jodh-intel @fidencio @devimc feedback is appreciated here.

Hi @jcvenegas !

So based in your 3 items, I wanted to remove the generic name experimental_qemu to experimental_qemu_virtiofs, this is because I don't want to lead to a confusion of what it means that build (only for virtiofs experimentation).

Overall I liked your proposal, it makes the "experimental" concept correct IMHO.

One thing that I'm not sure is about replace experimental_qemu to experimental_qemu_virtiofs. I mean, experimental_qemu_virtiofs makes complete sense but I don't know how hard it will be to adapt the CI scripts. Also you will need to change the scripts to take KATA_HYPERVISOR into consideration, right?

* `KATA_HYPERVISOR=qemu-virtiofs-experimental` && `experimental_virtiofs_qemu=True` &&  `experimental_virtiofs_kernel=True`
  qemu: `qemu-from-virtiofs-fork`
  kernel: `kernel from virtiofs-fork`
  runtime config: `kata-qemu-virtiofs-experimental.toml` with virtiofs as shared-fs backend

* `KATA_HYPERVISOR=qemu-virtiofs` && `experimental_virtiofs_qemu=False` &&  `experimental_virtiofs_kernel=False`
  qemu: `qemu-vanilla`
  kernel: `kernel lts`
  runtime config: `kata-qemu.toml` with virtiofs as shared-fs backend

I never tested this case (qemu-vanilla + kernel lts + virtiofs). I don't think the CI scripts currently handle that case, do they? If not, is this case your ultimate goal with this improvement?

Side note: handling patches for virtiofs experimental and qemu-valina+virtiofs can be trick. See for example kata-containers/packaging#1177 , I've a patch cherry-picked from upstream QEMU which applies clean on virtiofs experimental but it fails on upstream QEMU 5.0.0. If testing qemu-vanilla+virtiofs isn't a valid/undesired scenario then I won't need to adapt the patch to upstream QEMU 5.0.0 too.

* `KATA_HYPERVISOR=qemu` && `experimental_virtiofs_qemu=False` &&  `experimental_virtiofs_kernel=False`
  qemu: `qemu-vanilla`
  kernel: `kernel lts`
  runtime config: `kata-qemu.toml` with 9pfs as shared-fs backend

Another doubt that I have is: what if in the future virtiofs provides 2 experimental features and we want to test them separated?

I think if we have more features to test we can keep testing together. Unless the features are mutually exclusive, but what we are looking its to try to experiment for features that would go default (when they are merged and released in an official qemu and we move to it).

Yeah, having to test two experiments in separated would be a corner case or just lack of luck :)

One thing that I'm not sure is about replace experimental_qemu to experimental_qemu_virtiofs. I mean, experimental_qemu_virtiofs makes complete sense but I don't know how hard it will be to adapt the CI scripts. Also you will need to change the scripts to take KATA_HYPERVISOR into consideration, right?

There is already support for that partially, but yeah need to extend this script.
https://github.com/kata-containers/tests/blob/master/.ci/install_kata.sh#L62

I never tested this case (qemu-vanilla + kernel lts + virtiofs). I don't think the CI scripts currently handle that case, do they? If not, is this case your ultimate goal with this improvement?

Yes the final goal is move kata-virtiofs to stable (not experimental binaries), included the kata-deploy binaries.

The main reason is that recently we removed DAX from kata as causes some issues. After disable DAX we agreed to move to a stable kernel and qemu for kata-qemu-virtiofs. But enable some CI jobs to keep doing some testing around DAX or other features.

Side note: handling patches for virtiofs experimental and qemu-valina+virtiofs can be trick. See for example kata-containers/packaging#1177 , I've a patch cherry-picked from upstream QEMU which applies clean on virtiofs experimental but it fails on upstream QEMU 5.0.0. If testing qemu-vanilla+virtiofs isn't a valid/undesired scenario then I won't need to adapt the patch to upstream QEMU 5.0.0 too.

I agree this can cause some overhead in maintenance, this kind of issue could be solved in cases that qemu maintainers backport fixes, correct?

About the Openshift CI and the fix you send.

I feel that you could start to use kata-virtiofs based in qemu-vanilla and lts kernel (unless you feel that your depend in something from virtiofs-dev sources), of course this kind fixes will need to be added by time to time but your CI will be more stable (specially if we enable more things or move code base faster).

what would you like to use?

  • 5.0 + lts kernel?,
  • qemu + kernel (to any commit that virtiofs-dev could have)
  • Or one job per type.

I hope it make sense to move kata-virtiofs to our default qemu and kernel.

Note: Based on this talk I feel that moving kata-virtiofs to stable components, and then create kata-virtiofs-exeperimental may be an overhead. So I think is better just:

  1. Move kata-qemu-virtiofs to our official qemu and kernel
  2. do a CI job that job that changes the kata-qemu-virtiofs config and install experimental qemu + exeperimental kernel

One thing that I'm not sure is about replace experimental_qemu to experimental_qemu_virtiofs. I mean, experimental_qemu_virtiofs makes complete sense but I don't know how hard it will be to adapt the CI scripts. Also you will need to change the scripts to take KATA_HYPERVISOR into consideration, right?

There is already support for that partially, but yeah need to extend this script.
https://github.com/kata-containers/tests/blob/master/.ci/install_kata.sh#L62

I never tested this case (qemu-vanilla + kernel lts + virtiofs). I don't think the CI scripts currently handle that case, do they? If not, is this case your ultimate goal with this improvement?

Yes the final goal is move kata-virtiofs to stable (not experimental binaries), included the kata-deploy binaries.

The main reason is that recently we removed DAX from kata as causes some issues. After disable DAX we agreed to move to a stable kernel and qemu for kata-qemu-virtiofs. But enable some CI jobs to keep doing some testing around DAX or other features.

Side note: handling patches for virtiofs experimental and qemu-valina+virtiofs can be trick. See for example kata-containers/packaging#1177 , I've a patch cherry-picked from upstream QEMU which applies clean on virtiofs experimental but it fails on upstream QEMU 5.0.0. If testing qemu-vanilla+virtiofs isn't a valid/undesired scenario then I won't need to adapt the patch to upstream QEMU 5.0.0 too.

I agree this can cause some overhead in maintenance, this kind of issue could be solved in cases that qemu maintainers backport fixes, correct?

That's a good idea. I will speak with David about that possibility. However it is likely to be backported to stable-5.0 and effectively released with 5.2.0 (I checked, the fix isn't in 5.1.0).

About the Openshift CI and the fix you send.

I feel that you could start to use kata-virtiofs based in qemu-vanilla and lts kernel (unless you feel that your depend in something from virtiofs-dev sources), of course this kind fixes will need to be added by time to time but your CI will be more stable (specially if we enable more things or move code base faster).

Yes, I agree here too. I began using virtiofs experimental for convenience (the scripts simply support that case) and ignorance (I didn't have a complete understanding of the various QEMU configurations the project deals with, thanks for sharing that!)

what would you like to use?

* 5.0  + lts kernel?,

Likely to use that option. However I will keep kata-containers/packaging#1177 on hold for a while now. It is almost certain I will start working on on-boarding Kata 2.x on OpenShift-CI, and I believe 2.x will bump to QEMU 5.1.0 soon. So I see how to address that failure.

* qemu + kernel (to any commit that virtiofs-dev could have)

* Or one job per type.

Unlikely to test per type.

I hope it make sense to move kata-virtiofs to our default qemu and kernel.

I think so.

Note: Based on this talk I feel that moving kata-virtiofs to stable components, and then create kata-virtiofs-exeperimental may be an overhead. So I think is better just:

1. Move kata-qemu-virtiofs to our official qemu and kernel

2. do a CI job that job that changes the kata-qemu-virtiofs config and install experimental qemu  + exeperimental kernel

Yes, after this discussion I also agree that this is the best approach.

@wainersm cool thanks for the feedback, I will continue on that plan , I'll ping you in the PRs to follow up this.

@wainersm cool thanks for the feedback, I will continue on that plan , I'll ping you in the PRs to follow up this.

@jcvenegas ywc! I'm happy to help on the reviews too.