kubernetes-sigs/node-feature-discovery-operator

OCP NFD not serving nfd.k8s-sigs.io/v1alpha1

mythi opened this issue · 26 comments

mythi commented

What happened:

I maintain a set of NodeFeatureRules:

apiVersion: nfd.k8s-sigs.io/v1alpha1
kind: NodeFeatureRule
metadata:
  name: intel-dp-devices

but this fails to deploy on Openshift:

The server doesn't have a resource type "kind: NodeFeatureRule, apiVersion: nfd.k8s-sigs.io/v1alpha1

What you expected to happen:
I can use my NodeFeatureRule on both openshift and vanilla kubernetes without patching/maintaining cluster specific copies of the same content.

How to reproduce it (as minimally and precisely as possible):
See my NodeFeatureRule

Thanks for reporting this @mythi. This has somehow avoided my radar as it's quite new feature.

The API group should definitely match the NFD operand upstream. Also, importantly, we should not be using *.k8s.io (or kubernetes.io) API domain when/if we haven't gone through the K8s API review. If we had, we should also have the corresponding api-approved.kubernetes.io: https://github.com/kubernetes/enhancements/pull/<kep> annotation in place.

/assign @ArangoGutierrez

we are using k8s-sigs for upstream, and I have set https://github.com/openshift/cluster-nfd-operator/blob/master/config/crd/bases/nfd.openshift.io_v1alpha1_nodefeaturerules.yaml#L9 openshift for downstream. but you have a good point, said diff could break interoperability among users

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

mythi commented

/remove-lifecycle stale

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

/remove-lifecycle rotten

I no longer maintain the OpenShift downstream version
/assign @yevgeny-shnaidman

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

mythi commented

/remove-lifecycle stale

@mythi which NFD version are you deploying on the OCP cluster: upstream, or OCP version?

The idea of this issue is that users can easily move from upstream to OCP and back without having to edit their CRD's due to the diff on the API name

see https://github.com/openshift/node-feature-discovery/blob/master/deployment/base/nfd-crds/nfd-api-crds.yaml#L8

  creationTimestamp: null
  name: nodefeatures.nfd.openshift.io
spec:
  group: nfd.openshift.io

it says openshift instead of apiVersion: nfd.k8s-sigs.io , so users are forced to have 2 set's of CR's adding some maintenance complexity

i understand that, and i am not against the idea, but it also means that current OPC NFD customers will have to change their code/deployment.

maybe OCP-NFD can provide a migration path, over a 3 releases span, by adding a flag/config way to enable the upstream API and document that enough so users know they have 3 releases time (which in OCP is like a year or so) to migrate their CR's to the upstream

@ArangoGutierrez @mythi what about allowing deploying upstream NFD on OCP? I just need to check that it is working

What do you mean by "allow" ? the only diff from upstream to OCP is the SCC and RBAC bits needed by OCP. everything else works, is not what @mythi is trying to convey here. IMO

What mean is that instead of installing OCP NFD, @mythi can install upstream NFD on OCP , and that way can continue using his NodeFeatureRule yaml without any change. I just need to make sure that upstream NFD installation on OCP works. that way he can have immediate solution to his issue.

mythi commented

What mean is that instead of installing OCP NFD, @mythi can install upstream NFD on OCP

One thing to clarify is that it's not what I can do or cannot do. If I provide my device plugin users some NodeFeatureRules, they should just run.

ok @mythi , i understand your use-case now, we will check how to propagate it

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

/remove-lifecycle stale