GoogleCloudPlatform/elcarro-oracle-operator

Database deletion hangs if the Instance has been deleted first (e.g. via Helm)

nblxa opened this issue · 3 comments

nblxa commented

Describe the bug
If I first delete the Instance resource, and then one of its Database resources, the deletion of the Database never finishes.

To Reproduce
Steps to reproduce the behavior:

  1. Create an Instance with a Database as follows
---
apiVersion: oracle.db.anthosapis.com/v1alpha1
kind: Instance
metadata:
  name: mydb
spec:
  cdbName: CDB
  cloudProvider: GCP
  databaseResources:
    requests:
      memory: 4.0Gi
  dbDomain: gke
  disks:
    - name: DataDisk
      size: 45Gi
      storageClass: "standard-rwo"
    - name: LogDisk
      size: 55Gi
      storageClass: "standard-rwo"
  edition: Enterprise
  images:
    service: gcr.io/...
  services:
    Backup: true
    Monitoring: true
    Logging: true
  sourceCidrRanges: [ 0.0.0.0/0 ]
  type: Oracle
  version: "19.15"
---
apiVersion: oracle.db.anthosapis.com/v1alpha1
kind: Database
metadata:
  name: mydb
spec:
  instance: mydb
  name: MY_PDB
  admin_password: ...

(update image and password above)

Wait until the Database is Ready.

$ kubectl get databases.oracle.db.anthosapis.com
NAME   INSTANCE   USERS   PHASE   DATABASEREADYSTATUS   DATABASEREADYREASON   USERREADYSTATUS   USERREADYREASON
mydb   mydb               Ready   True                  CreateComplete        True              SyncComplete
  1. Delete the Instance
$ kubectl delete instances.oracle.db.anthosapis.com mydb
instance.oracle.db.anthosapis.com "mydb" deleted
  1. Delete the Database
$ kubectl delete databases.oracle.db.anthosapis.com mydb
database.oracle.db.anthosapis.com "mydb" deleted
^C

The deletion of the database hangs after the message is printed. Operator logs show repeated messages

I1020 06:56:02.378181       1 database_controller.go:129] controllers/Database "msg"="reconciling Database (PDB) deletion..." "Database"={"Namespace":"default","Name":"mydb"}
E1020 06:56:02.379075       1 controller.go:326]  "msg"="Reconciler error" "error"="Instance.oracle.db.anthosapis.com \"mydb\" not found" "controller"="database" "controllerGroup"="oracle.db.anthosapis.com" "controllerKind"="Database" "database"={"name":"mydb","namespace":"default"} "name"="mydb" "namespace"="default" "reconcileID"="ee938527-c9a0-4a3e-ac06-fef33e2869b5"

Expected behavior
My expectation is that since the Instance deletes the underlying CDB altogether, the Database deletion should be immediate. This is even more important when using Helm to install & uninstall a release containing El Carro resources, since Helm does not guarantee any specific order in this case.

Even though I constructed the above minimal case for reproducing the issue with kubectl, the original problem appeared when using helm.

Additional context
Kubernetes versions:

kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.15", GitCommit:"1d79bc3bcccfba7466c44cc2055d6e7442e140ea", GitTreeState:"clean", BuildDate:"2022-09-21T12:11:27Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.12-gke.2300", GitCommit:"e55564cf3a1384026a54920174977659c8c56a50", GitTreeState:"clean", BuildDate:"2022-08-16T09:24:51Z", GoVersion:"go1.16.15b7", Compiler:"gc", Platform:"linux/amd64"}

Workarounds are either:

  • use the "correct" deletion order: first Databases, then Instances
    or:
  • if the Database hangs and cannot be deleted, remove its metadata.finalizers

I just want to expand on the expectation. I would expect that deleting the instance would either propagate down all the downstream objects much like deleting a deployment will also clean up pods and services, etc.

Or we should make it so that a database could be kept by itself. We could unplug it and leave it in a state in which we were able to attach it to another instance, for example.

nblxa commented

Unplugging a database can take a lot of time for the large ones, so I don't think it should be done unless the user is explicitly requesting it.

I like the idea of propagating deletion down to all the database objects attached to Instance. I will have PR out for that shortly.