konveyor/operator

Change "rwx_supported" default value to false and make "true" an explicit change

Closed this issue · 7 comments

Let's consider changing the default of rwx_supported to false so installs will work on any environment. Make the performance optimization of using 'rwx' be opt-in for the informed individual.

Background for other users:

I deployed the latest operator (As of 3/24/23) on an AWS OCP 4.12 cluster and saw that the hub was in a pending state due to the PVC 'tackle-cache-volume-claim' in Pending because it was looking for a RWX PV.

The fix for this is to edit the Tackle CR and add:
rwx_supported: "false"

I think we should consider changing behavior so this is not required and default to rwx_supported: "false"
My reasoning is requiring knowledge of RWX support AND modifying the CR adds friction to trying out Konveyor, it requires the person installing to think a bit and understand what is happening and then look up what setting we have available in the CR to change.

More background info to help others who run into this

Hub is in pending

$ oc get pods
NAME                                            READY   STATUS    RESTARTS   AGE
tackle-hub-5fd4977b47-tzzm7                     0/1     Pending   0          17h
tackle-operator-855b9c49dd-tqkl5                1/1     Running   0          17h
tackle-pathfinder-68bfdbcfbf-2c87k              1/1     Running   0          17h
tackle-pathfinder-postgresql-675cbfd477-p7526   1/1     Running   0          17h
tackle-ui-6679969564-stbt7                      1/1     Running   0          17h

Hub is in Pending due to a PVC not being ready.

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-03-24T18:24:44Z"
    message: 'running PreBind plugin "VolumeBinding": binding volumes: provisioning
      failed for PVC "tackle-cache-volume-claim"'
    reason: SchedulerError
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: Burstable
 $ oc get pvc
NAME                                        STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
tackle-cache-volume-claim                   Pending                                                                        gp2            17h
tackle-hub-bucket-volume-claim              Bound     pvc-ec12a9db-5ba2-4f74-ab64-666d8ecb6b8b   100Gi      RWO            gp2            17h
tackle-hub-database-volume-claim            Bound     pvc-51e43a2f-5299-4b3c-a32b-2540be175ec2   5Gi        RWO            gp2            17h
tackle-pathfinder-postgresql-volume-claim   Bound     pvc-3be99dae-2b91-46ba-b16f-afc405f4e058   1Gi        RWO            gp2            17h

The PVC is requesting a 'RWX' volume which this storage class isn't able to provide.

$ oc get pvc tackle-cache-volume-claim -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
    volume.kubernetes.io/selected-node: ip-10-0-141-130.us-west-2.compute.internal
    volume.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
  creationTimestamp: "2023-03-24T18:24:52Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app.kubernetes.io/name: cache
    app.kubernetes.io/part-of: tackle
    volume: tackle-cache-data
  name: tackle-cache-volume-claim
  namespace: konveyor-tackle
  ownerReferences:
  - apiVersion: tackle.konveyor.io/v1alpha1
    kind: Tackle
    name: tackle
    uid: c6dc9d57-0b12-4ee6-82b3-2641b08d9f2e
  resourceVersion: "524661"
  uid: e4f8b5d8-0222-4112-b921-1c4ed06bc863
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 100Gi
  storageClassName: gp2
  volumeMode: Filesystem
status:
  phase: Pending

Thanks for this. I faced the same issue. I had to create the EFS, which supports RWX, and then I created a storage class, PV and PVC were bound.

image

Hi John, Whether you were able to set up and access the application successfully? I can get the login page, but after providing credentials, I get the below page. Any help /input on this will be helpful.

I have set up the nginx controller and have installed tackle on the EKS cluster.

image

image

image

@Murali-Cloudbridge is this a local minikube cluster, or is this a cluster you deployed to AWS?

Hi John,

I have deployed tackle in the EKS cluster in an AWS environment.

Yes I was not sure of accessing it, based on the documentation I tried to use the proxy.

kubectl port-forward svc/ 9090 -n my-tackle-operator

kubectl port-forward svc/tackle-hub 8080 -n my-tackle-operator

kubectl port-forward svc/tackle-ui 8080 -n my-tackle-operator

@Murali-Cloudbridge thanks

There is a path that I think will unblock you right now if you are interested, that would be to disable use of Keycloak.

Set this option in the Tackle CR
feature_auth_required: false

You can see an example here for a local install on MacOS with minikube

Then you can access the tackle2-ui as you were with it being proxied.

As to addressing this on EKS with auth enabled, I think we need to get the Ingress setup to access the UI via an Ingress resource. @jmontleon @fbladilo @ibolton336 can you help @Murali-Cloudbridge ?

@Murali-Cloudbridge in addition, I opened another issue #170 to help us expand install docs for EKS. I don't think we have explicit docs at moment of using Ingress that are not focused on minikube.
(cc @savitharaghunathan )

Thanks a lot, John. I will try to modify the CR and check.

It will be great if I can access it via Ingress resource.

One more to report, once the tackle is installed keycloak pod was pending. It was due to pv was not bound. I had to install Addon in the EKS cluster - EBS-related drivers. I also got some permission issues, as shown below, the pod was crashingloopbackoff.

[mkdir: cannot create directory '/var/lib/pgsql/data/userdata': Permission denied

Then, I updated deployment yaml file and added below section in security context:

securitycontext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000

Then Issue was solved for keycloak pods and postgress.

@Murali-Cloudbridge let's use the issue you opened #167 for working through the Ingress problem for logging into the UI with auth enabled.