metal3-io/baremetal-operator

BMH CRs have no status

fracappa opened this issue · 7 comments

Steps I followed

Hello everyone.

I'm using the baremetal-operator to manage my on-premise cluster where I have Dell servers, equipped with iDRAC 9.

I deploy the metal3 provider like the following:

clusterctl init --core cluster-api:v1.7.3 \
    --bootstrap kubeadm:v1.7.3 \
    --control-plane kubeadm:v1.7.3 -v5

and then:

clusterctl init --infrastructure metal3

Whereas, I install the baremetal operator in the following way:

git clone https://github.com/metal3-io/baremetal-operator.git
kubectl create namespace baremetal-operator-system
cd baremetal-operator
kustomize build config/default | kubectl apply -f -

as specified in the metal3-provider github repository.

Then, I also deploy the ironic pods after customizing the env var files in this way:

./baremetal-operator/tools/deploy.sh -i.

After the setup, I try to create my BMH resources this way:

apiVersion: v1
kind: Secret
metadata:
  name: bmc-credentials-spring
type: Opaque
data:
  username: <username>
  password: <password>
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: spring-bmh
spec:
  online: true
  bootMACAddress: <Boot-MAC-address>
  bootMode: UEFI
  bmc:
    address: idrac-virtualmedia://<iDRAC-IP>:443/redfish/v1/Systems/System.Embedded.1
    credentialsName: bmc-credentials-spring
    disableCertificateVerification: true
  image:
    checksum: http://<image-server>/images/SHA256SUMS
    checksumType: sha256
    format: qcow2
    url:  http://<image-server>/images/noble-server-cloudimg-amd64.img
  userData:
    name: cloud-init-spring

What happened

I expect that the BMH resource would go to the registering state, and then moving to inspecting, provisioning and provisioned.

However, the resource has no STATUS and it looks like this:

NAME         STATUS   STATE   CONSUMER   BMC                                                                            ONLINE   ERROR   AGE
spring-bmh                               idrac-virtualmedia://<IDRAC-IP>:443/redfish/v1/Systems/System.Embedded.1   true             19h

More details

Moreover, when I inspect the logs of the ironic pods I have these:

2024-07-30 07:36:55.039 1 DEBUG ironic.conductor.periodics [-] Completed periodic task for purpose checking if async firmware update failed. wrapper /usr/lib/python3.9/site-packages/ironic/conductor/periodics.py:174
2024-07-30 07:36:55.041 1 DEBUG ironic.conductor.periodics [-] Completed periodic task for purpose checking async firmware update tasks. wrapper /usr/lib/python3.9/site-packages/ironic/conductor/periodics.py:174
2024-07-30 07:37:00.855 1 INFO eventlet.wsgi.server [None req-29d822e7-e71d-475d-a9de-0b7d233838a0 - - - - - -] ::ffff:10.0.0.110 "GET /v1/ HTTP/1.1" status: 200  len: 2909 time: 0.0024757
2024-07-30 07:37:00.856 1 INFO ironic_lib.auth_basic [None req-29d822e7-e71d-475d-a9de-0b7d233838a0 - - - - - -] No authorization token received
2024-07-30 07:37:00.857 1 INFO eventlet.wsgi.server [None req-29d822e7-e71d-475d-a9de-0b7d233838a0 - - - - - -] ::ffff:10.0.0.110 "GET /v1/drivers HTTP/1.1" status: 401  len: 222 time: 0.0007792
2024-07-30 07:37:11.237 1 INFO eventlet.wsgi.server [None req-6276dbaf-36a9-4d6c-9aa7-6866e1f9cd4c - - - - - -] ::ffff:<IP> "GET / HTTP/1.1" status: 200  len: 645 time: 0.0010841
2024-07-30 07:37:11.338 1 INFO eventlet.wsgi.server [None req-49321d2f-5295-4293-abbc-94d1d7f3e2e3 - - - - - -] ::ffff:<IP> "GET / HTTP/1.1" status: 200  len: 645 time: 0.0013549
2024-07-30 07:37:18.286 1 INFO eventlet.wsgi.server [None req-289eb088-7158-4951-991c-4cc5ffc422bc - - - - - -] ::ffff:10.0.0.110 "GET /v1/ HTTP/1.1" status: 200  len: 2909 time: 0.0020258
2024-07-30 07:37:18.287 1 INFO ironic_lib.auth_basic [None req-289eb088-7158-4951-991c-4cc5ffc422bc - - - - - -] No authorization token received
2024-07-30 07:37:18.288 1 INFO eventlet.wsgi.server [None req-289eb088-7158-4951-991c-4cc5ffc422bc - - - - - -] ::ffff:10.0.0.110 "GET /v1/drivers HTTP/1.1" status: 401  len: 222 time: 0.0006793
2024-07-30 07:37:24.954 1 DEBUG futurist.periodics [-] Submitting periodic callback 'ironic.drivers.modules.pxe_base.PXEBaseMixin._check_boot_timeouts' _process_scheduled /usr/lib/python3.9/site-packages/futurist/periodics.py:638
2024-07-30 07:37:24.957 1 DEBUG futurist.periodics [-] Submitting periodic callback 'ironic.drivers.modules.pxe_base.PXEBaseMixin._check_boot_timeouts' _process_scheduled /usr/lib/python3.9/site-packages/futurist/periodics.py:638
2024-07-30 07:37:24.958 1 DEBUG ironic.conductor.periodics [-] Completed periodic task for purpose checking PXE boot status. wrapper /usr/lib/python3.9/site-packages/ironic/conductor/periodics.py:174
2024-07-30 07:37:24.960 1 DEBUG futurist.periodics [-] Submitting periodic callback 'ironic.drivers.modules.pxe_base.PXEBaseMixin._check_boot_timeouts' _process_scheduled /usr/lib/python3.9/site-packages/futurist/periodics.py:638
2024-07-30 07:37:24.961 1 DEBUG ironic.conductor.periodics [-] Completed periodic task for purpose checking PXE boot status. wrapper /usr/lib/python3.9/site-packages/ironic/conductor/periodics.py:174
2024-07-30 07:37:24.964 1 DEBUG ironic.conductor.periodics [-] Completed periodic task for purpose checking PXE boot status. wrapper /usr/lib/python3.9/site-packages/ironic/conductor/periodics.py:174
2024-07-30 07:37:24.966 1 DEBUG futurist.periodics [-] Submitting periodic callback 'ironic.drivers.modules.pxe_base.PXEBaseMixin._check_boot_timeouts' _process_scheduled /usr/lib/python3.9/site-packages/futurist/periodics.py:638
2024-07-30 07:37:24.969 1 DEBUG ironic.conductor.periodics [-] Completed periodic task for purpose checking PXE boot status. wrapper /usr/lib/python3.9/site-packages/ironic/conductor/periodics.py:174
2024-07-30 07:37:30.861 1 INFO eventlet.wsgi.server [None req-6f9147eb-8f68-4c1c-9bee-87739b139f05 - - - - - -] ::ffff:10.0.0.110 "GET /v1/ HTTP/1.1" status: 200  len: 2909 time: 0.0021667
2024-07-30 07:37:30.861 1 INFO ironic_lib.auth_basic [None req-6f9147eb-8f68-4c1c-9bee-87739b139f05 - - - - - -] No authorization token received
2024-07-30 07:37:30.862 1 INFO eventlet.wsgi.server [None req-6f9147eb-8f68-4c1c-9bee-87739b139f05 - - - - - -] ::ffff:10.0.0.110 "GET /v1/drivers HTTP/1.1" status: 401  len: 222 time: 0.0007560

Environment:

  • Baremetal Operator version: v0.6.1
  • Environment (metal3-dev-env or other): baremetal environment on Dell machines

/kind bug

This issue is currently awaiting triage.
If Metal3.io contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.
The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@fracappa
Based on the error message and the fact that you can't see the state of the BMH I would say that it is a "Basic Authentication issue" the deploy script makes sure that BMO and Ironic uses the same credential for the Ironic API but you are deploying BMO without the script and Ironic with the script.

I would recommend deploying BOM and Ironic together like this:
./baremetal-operator/tools/deploy.sh -b -i

AFAIK this is not an Idrac specific issue simply a miss configured BAUTH setup.
I will remove the bug label for now.

/triage needs-information

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale