Issues with robot-shop while migrating from 3.11 to 4.5 IBM ROKS single AZ
vidya191 opened this issue · 20 comments
Hello Team
I find issues with the robot-shop . The application works fine when installed in source or target but when tried to migrate from source to target using MTC it throws error. Please check the logs and help me with a fix
oc get pods
NAME READY STATUS RESTARTS AGE
cart-56d64d845-hwr6q 1/1 Running 0 69m
catalogue-7b84494bf6-z55kj 1/1 Running 0 69m
dispatch-86bd8f4886-td5lb 1/1 Running 0 69m
mongodb-ffd467b6d-7f72s 0/1 CrashLoopBackOff 18 69m
mysql-64f679d7b6-dkpf5 0/2 Init:CrashLoopBackOff 18 69m
payment-74fffc4db4-hhc55 1/1 Running 0 69m
rabbitmq-5bb66bb6c9-777dr 1/1 Running 0 69m
ratings-65964c5f8f-mbmd5 1/1 Running 0 69m
redis-5694467c97-9zvqz 1/1 Running 0 69m
shipping-b798968df-c68rs 1/1 Running 6 69m
user-84496cc588-c6kss 1/1 Running 0 69m
web-79cb976f9-phj8d 1/1 Running 0 69m
oc logs -f mongodb-ffd467b6d-7f72s
about to fork child process, waiting until server is ready for connections.
forked process: 18
2021-02-11T11:16:22.658+0000 I CONTROL [main] ***** SERVER RESTARTED *****
2021-02-11T11:16:22.749+0000 I CONTROL [initandlisten] MongoDB starting : pid=18 port=27017 dbpath=/data/db 64-bit host=mongodb-ffd467b6d-7f72s
2021-02-11T11:16:22.749+0000 I CONTROL [initandlisten] db version v3.6.1
2021-02-11T11:16:22.749+0000 I CONTROL [initandlisten] git version: 025d4f4fe61efd1fb6f0005be20cb45a004093d1
2021-02-11T11:16:22.749+0000 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.1t 3 May 2016
2021-02-11T11:16:22.749+0000 I CONTROL [initandlisten] allocator: tcmalloc
2021-02-11T11:16:22.749+0000 I CONTROL [initandlisten] modules: none
2021-02-11T11:16:22.749+0000 I CONTROL [initandlisten] build environment:
2021-02-11T11:16:22.749+0000 I CONTROL [initandlisten] distmod: debian81
2021-02-11T11:16:22.749+0000 I CONTROL [initandlisten] distarch: x86_64
2021-02-11T11:16:22.749+0000 I CONTROL [initandlisten] target_arch: x86_64
2021-02-11T11:16:22.749+0000 I CONTROL [initandlisten] options: { net: { bindIp: "127.0.0.1", port: 27017, ssl: { mode: "disabled" } }, processManagement: { fork: true, pidFilePath: "/tmp/docker-entrypoint-temp-mongod.pid" }, systemLog: { destination: "file", logAppend: true, path: "/proc/1/fd/1" } }
2021-02-11T11:16:22.751+0000 I STORAGE [initandlisten] exception in initAndListen: IllegalOperation: Attempted to create a lock file on a read-only directory: /data/db, terminating
2021-02-11T11:16:22.751+0000 I CONTROL [initandlisten] now exiting
2021-02-11T11:16:22.751+0000 I CONTROL [initandlisten] shutting down with code:100
ERROR: child process failed, exited with error number 100
To see additional information in this output, start without the "--fork" option.
[vidya@oc7382228470 Downloads]$
[vidya@oc7382228470 Downloads]$ oc logs -f mysql-64f679d7b6-dkpf5
error: a container name must be specified for pod mysql-64f679d7b6-dkpf5, choose one of: [dataloader mysql] or one of the init containers: [post-hook]
[vidya@oc7382228470 Downloads]$ oc logs -f mysql-64f679d7b6-dkpf5 -c dataloader
Error from server (BadRequest): container "dataloader" in pod "mysql-64f679d7b6-dkpf5" is waiting to start: PodInitializing
[vidya@oc7382228470 Downloads]$ oc logs -f mysql-64f679d7b6-dkpf5 -c mysql
Error from server (BadRequest): container "mysql" in pod "mysql-64f679d7b6-dkpf5" is waiting to start: PodInitializing
[vidya@oc7382228470 Downloads]$ oc logs -f mysql-64f679d7b6-dkpf5 -c post-hook
cp: cannot create regular file '/var/lib/mysql/10-dump.sql.gz': Permission denied
cp: cannot create regular file '/var/lib/mysql/20-ratings.sql': Permission denied
[vidya@oc7382228470 Downloads]$
[vidya@oc7382228470 Downloads]$
[vidya@oc7382228470 Downloads]$ oc get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
default (default) ibm.io/ibmc-file Delete Immediate false 10d
ibmc-block-bronze ibm.io/ibmc-block Delete Immediate true 10d
ibmc-block-custom ibm.io/ibmc-block Delete Immediate true 10d
ibmc-block-gold ibm.io/ibmc-block Delete Immediate true 10d
ibmc-block-retain-bronze ibm.io/ibmc-block Retain Immediate true 10d
ibmc-block-retain-custom ibm.io/ibmc-block Retain Immediate true 10d
ibmc-block-retain-gold ibm.io/ibmc-block Retain Immediate true 10d
ibmc-block-retain-silver ibm.io/ibmc-block Retain Immediate true 10d
ibmc-block-silver ibm.io/ibmc-block Delete Immediate true 10d
ibmc-file-bronze ibm.io/ibmc-file Delete Immediate false 10d
ibmc-file-bronze-gid ibm.io/ibmc-file Delete Immediate false 10d
ibmc-file-custom ibm.io/ibmc-file Delete Immediate false 10d
ibmc-file-gold ibm.io/ibmc-file Delete Immediate false 10d
ibmc-file-gold-gid ibm.io/ibmc-file Delete Immediate false 10d
ibmc-file-retain-bronze ibm.io/ibmc-file Retain Immediate false 10d
ibmc-file-retain-custom ibm.io/ibmc-file Retain Immediate false 10d
ibmc-file-retain-gold ibm.io/ibmc-file Retain Immediate false 10d
ibmc-file-retain-silver ibm.io/ibmc-file Retain Immediate false 10d
ibmc-file-silver ibm.io/ibmc-file Delete Immediate false 10d
ibmc-file-silver-gid ibm.io/ibmc-file Delete Immediate false 10d
[vidya@oc7382228470 Downloads]$ oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
mongodb-volume-claim Bound pvc-d2a7ab0d-2607-44c6-962f-dad79be2b1ae 20Gi RWO default 86m
mysql-data-volume-claim Bound pvc-17c26d35-89cf-43e0-8983-18f25591c8aa 20Gi RWO default 86m
redis-volume-claim Bound pvc-01b93ec0-7b62-40d3-944a-6397588a1aca 20Gi RWO default 86m
Team
Can someone look into the issue and update me
When we looked at ROKS we found it had some interesting storage constraints. We had this demo working in a multi-AZ ROKS environment at one point: konveyor/mig-demo-apps@0610156
I wonder if the uid/gid is the same on the source and destination after migration? If not the supplemental groups options might be able to help, but I am not certain.
Thank you for you response Jason
I've installed the latest version of robot shop application available in the repo.Also I'm getting similar issues in the sock-shop application. I've attached the logs of sock-shop app for your reference.
Also I could see the UID for mysql pod differs from source and destination.Could you explain more on supplemental groups options so that I can try it out
Source:
name: dataloader
resources: {}
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- SETGID
- SETUID
runAsUser: 1000100001
Target:
name: dataloader
resources: {}
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- SETGID
- SETUID
runAsUser: 1000140000
The different runAsUser strikes me as odd. We normally create the namespace and preserve the UID range and other annotations you see on the namespace, for example:
$ oc get namespace openshift-migration -o yaml
apiVersion: v1
kind: Namespace
metadata:
annotations:
openshift.io/sa.scc.mcs: s0:c23,c22
openshift.io/sa.scc.supplemental-groups: 1000550000/10000
openshift.io/sa.scc.uid-range: 1000550000/10000
You mentioned you were able to to run the sample app on either side. Are you creating or leaving the namespace on the destination before migration? If so can you delete it prior to migration and see if it migrates normally?
I'm not sure it should matter, just another data point for us if we can spin up an environment to test, is the source cluster also a ROKS cluster?
@dymurray or @djwhatle do you have any additional thoughts or questions about the environment? Is the supplemental groups stuff something that might help here or am I off the mark?
Could you share the output of oc get -o yaml namespace robot-shop
from both clusters.
We used to remove the application on the target and then perform the migration.
Our source cluster is built on the VMWare VMs
Also I've attached the output for oc get -o yaml namespace robot-shop from both the clusters.
Target-ROKS-4.5.txt
Source-OCP-3.11.txt
Hello Team
Awaiting your comments.
Can you update the NS on the target cluster from:
openshift.io/sa.scc.supplemental-groups: 1000140000/10000
openshift.io/sa.scc.uid-range: 1000140000/10000
to:
openshift.io/sa.scc.supplemental-groups: 1000350000/10000
openshift.io/sa.scc.uid-range: 1000350000/10000
and restart the pods and see if they come up properly? (You may need to adjust the hook container command, this PR was merged in the last ~ day).
Data is likely migrated keeping the uid/gid from the source cluster. If they're now trying to run with a different uid/gid on the target the permission denied errors aren't surprising.
We expect for normal migration that the namespace doesn't exist on the target prior to migration. If it's being created during migration with a differing supplemental group and uid range I'd have to defer to any thoughts @dymurray or @djwhatle might have.
I'm not clear if this means you were deleting the resources in the namespace, or deleting the namespace, "We used to remove the application on the target and then perform the migration.". You should ensure the ns does not exist on the target before migration. oc delete project robot-shop
.
I've updated the NS and restarted the pods. I still get the error
Please check the evidences below
NS_updated-4.5ROKS.txt
pods-status_after_restart-4.5ROKS.txt
Also we ensure that the NS does not exist on the target before the migration
Hello Team
Could you take a look at the above comments and get back please
Team
could you check and respond please
I'm not sure what's going on with the mongodb container there. mysql may be crashing if you haven't picked up this fix. I found the issue while doing some digging around on this: https://github.com/konveyor/mig-demo-apps/pull/32/files
It's possible there's a pid file that just needs to be deleted for mongodb; in both cases the logs would help to determine that.
I have deleted robot-shop project from source and target , installed the recent robot-shop application on source and performed migration.
After executing Migmigration today, I found that stage pod for Mysql was in Init stage for long time and it got terminated without coming to running state. I suspect this might be the issue... Also I have attached the stage pod screenshot and logs for both Mongodb and MYSQL. Please check and update.
logs:
[vidyasubbaiah@oc7382228470 Downloads]$ oc get pods -n robot-shop
NAME READY STATUS RESTARTS AGE
cart-56d64d845-98nd9 1/1 Running 0 106s
catalogue-7b84494bf6-rdhzz 1/1 Running 0 106s
dispatch-86bd8f4886-794kr 1/1 Running 0 106s
mongodb-ffd467b6d-qvk82 0/1 CrashLoopBackOff 4 105s
mysql-54dddcf647-q459z 0/2 Init:CrashLoopBackOff 3 105s
payment-74fffc4db4-tr66d 1/1 Running 0 105s
rabbitmq-5bb66bb6c9-shqf2 1/1 Running 0 105s
ratings-65964c5f8f-ff2q2 1/1 Running 0 104s
redis-5694467c97-z6mfm 1/1 Running 0 104s
shipping-b798968df-n22hc 1/1 Running 0 104s
user-84496cc588-rqkv2 1/1 Running 0 104s
web-79cb976f9-s4c29 1/1 Running 2 104s
[vidyasubbaiah@oc7382228470 Downloads]$ oc logs mongodb-ffd467b6d-qvk82
Error from server (NotFound): pods "mongodb-ffd467b6d-qvk82" not found
[vidyasubbaiah@oc7382228470 Downloads]$ oc logs mongodb-ffd467b6d-qvk82 -n robot-shop
about to fork child process, waiting until server is ready for connections.
forked process: 19
2021-02-26T16:10:16.763+0000 I CONTROL [main] ***** SERVER RESTARTED *****
2021-02-26T16:10:16.857+0000 I CONTROL [initandlisten] MongoDB starting : pid=19 port=27017 dbpath=/data/db 64-bit host=mongodb-ffd467b6d-qvk82
2021-02-26T16:10:16.857+0000 I CONTROL [initandlisten] db version v3.6.1
2021-02-26T16:10:16.857+0000 I CONTROL [initandlisten] git version: 025d4f4fe61efd1fb6f0005be20cb45a004093d1
2021-02-26T16:10:16.857+0000 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.1t 3 May 2016
2021-02-26T16:10:16.857+0000 I CONTROL [initandlisten] allocator: tcmalloc
2021-02-26T16:10:16.857+0000 I CONTROL [initandlisten] modules: none
2021-02-26T16:10:16.857+0000 I CONTROL [initandlisten] build environment:
2021-02-26T16:10:16.857+0000 I CONTROL [initandlisten] distmod: debian81
2021-02-26T16:10:16.857+0000 I CONTROL [initandlisten] distarch: x86_64
2021-02-26T16:10:16.857+0000 I CONTROL [initandlisten] target_arch: x86_64
2021-02-26T16:10:16.857+0000 I CONTROL [initandlisten] options: { net: { bindIp: "127.0.0.1", port: 27017, ssl: { mode: "disabled" } }, processManagement: { fork: true, pidFilePath: "/tmp/docker-entrypoint-temp-mongod.pid" }, systemLog: { destination: "file", logAppend: true, path: "/proc/1/fd/1" } }
2021-02-26T16:10:16.858+0000 I STORAGE [initandlisten] exception in initAndListen: IllegalOperation: Attempted to create a lock file on a read-only directory: /data/db, terminating
2021-02-26T16:10:16.858+0000 I CONTROL [initandlisten] now exiting
2021-02-26T16:10:16.858+0000 I CONTROL [initandlisten] shutting down with code:100
ERROR: child process failed, exited with error number 100
To see additional information in this output, start without the "--fork" option.
[vidyasubbaiah@oc7382228470 Downloads]$
[vidyasubbaiah@oc7382228470 Downloads]$
[vidyasubbaiah@oc7382228470 Downloads]$ oc logs mysql-54dddcf647-q459z -n robot-shop
error: a container name must be specified for pod mysql-54dddcf647-q459z, choose one of: [dataloader mysql] or one of the init containers: [post-hook]
[vidyasubbaiah@oc7382228470 Downloads]$ oc logs mysql-54dddcf647-q459z post-hook -n robot-shop
cp: cannot create regular file '/var/lib/mysql/10-dump.sql.gz': Permission denied
cp: cannot create regular file '/var/lib/mysql/20-ratings.sql': Permission denied
FWIW, I was able to get access to a ROKS account. I successfully migrated robot-shop to a Single AZ ROKS cluster using MTC 1.4.1.
NAME READY STATUS RESTARTS AGE
cart-7cffd844db-7nnh9 1/1 Running 0 11m
catalogue-6bd7b6664c-8dzns 1/1 Running 0 11m
dispatch-5f454ff45-bs6nf 1/1 Running 0 11m
mongodb-56d8597f95-ztcsc 1/1 Running 0 11m
mysql-575549948-fclzr 2/2 Running 0 11m
payment-6687bf86f9-qqk7x 1/1 Running 0 11m
rabbitmq-75d9cf4484-4c44j 1/1 Running 0 11m
ratings-f89477654-qflrg 1/1 Running 0 11m
redis-75fd75d7fb-b5fmn 1/1 Running 0 11m
shipping-56ff6d5d66-kfzsm 1/1 Running 0 11m
user-5b8cf58f99-ktzj7 1/1 Running 0 11m
web-6c9685f5b8-q9fkn 1/1 Running 0 11m
The ns supplemental groups and uid range match on both clusters as I would expect if it were created as part of the migration, therefore preventing the permission errors.
source:
$ oc get namespace -o yaml robot-shop
apiVersion: v1
kind: Namespace
metadata:
annotations:
openshift.io/description: ""
openshift.io/display-name: ""
openshift.io/requester: opentlc-mgr
openshift.io/sa.scc.mcs: s0:c12,c4
openshift.io/sa.scc.supplemental-groups: 1000140000/10000
openshift.io/sa.scc.uid-range: 1000140000/10000
creationTimestamp: "2021-03-01T14:02:37Z"
name: robot-shop
...
And destination:
$ oc get namespace -o yaml robot-shop
apiVersion: v1
kind: Namespace
metadata:
annotations:
openshift.io/description: ""
openshift.io/display-name: ""
openshift.io/requester: opentlc-mgr
openshift.io/sa.scc.mcs: s0:c12,c4
openshift.io/sa.scc.supplemental-groups: 1000140000/10000
openshift.io/sa.scc.uid-range: 1000140000/10000
creationTimestamp: "2021-03-01T15:39:04Z"
...
The cluster versions were (source) 3.11 and (destination) 4.6.17. I used one of the DC region AZs for the cluster.
$ oc cluster-info
Kubernetes control plane is running at https://c100-e.us-east.containers.cloud.ibm.com:32601
$ oc version
Client Version: 4.7.0
Server Version: 4.6.17
Kubernetes Version: v1.19.0+e405995
@vidya191 can you help us to figure out what the reproducer steps are? As of now it looks like @jmontleon is not able to recreate this issue as described even on IBM ROKS.
Attached the Reproducer steps. Please let me know if you need more details
- Install MTC on source 3.11 cluster
- Install and configure MTC on target 4.5
- Update the migplan with robot-shop
apiVersion: migration.openshift.io/v1alpha1
kind: MigPlan
metadata:
labels:
controller-tools.k8s.io: "1.0"
name: migplan-sample2
namespace: openshift-migration
spec:
srcMigClusterRef:
name: migcluster-remote
namespace: openshift-migration
destMigClusterRef:
name: migcluster-local
namespace: openshift-migration
migStorageRef:
name: migstorage-sample
namespace: openshift-migration
indirectImageMigration: true
indirectVolumeMigration: true
namespaces:
- robot-shop
...
- Ensure robot-shop application is not present in target cluster
- Execute and validate the migplan
- Update the migmigration
kind: MigMigration
metadata:
labels:
controller-tools.k8s.io: "1.0"
name: mig-migration-sample2
namespace: openshift-migration
spec:
stage: false
quiescePods: true
keepAnnotations: true
migPlanRef:
name: migplan-sample2
namespace: openshift-migration
- Monitor the migmigration and validate the robot-shop application in the target.
Source cluster information
oc v3.11.248
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO
Server https://xxxxxx.xxxxx.xxx.com:8443
openshift v3.11.248
kubernetes v1.11.0+d4cacc0
Target Cluster information
[vidya@oc4377604745 ~]$ oc version
Client Version: openshift-clients-4.2
Server Version: 4.5.31
Kubernetes Version: v1.18.3+e574db2
Hello Team
Awaiting your reply
There needs to be something more specific to reproduction than install a 3.11 cluster, ROKS 4.5 cluster, install MTC on both, and try to migrate robot-shop. As mentioned here I was able to successfully complete a migration with those steps: #574 (comment)
The only place I see that I deviated was using 4.6 instead of 4.5. I can try again with 4.5 when time permits but I don't know of any bugs that would be specific to a version of OpenShift around this.
It would also be helpful to know what version of MTC you are installing.
If more information can be provided on this please feel free to reopen it. We haven't been able to reproduce this ourselves so our ability to fix any issue is limited by this.