Database deletion hangs if the Instance has been deleted first (e.g. via Helm)

Question

Database deletion hangs if the Instance has been deleted first (e.g. via Helm)

nblxa opened this issue 2 years ago · 3 comments

Describe the bug
If I first delete the Instance resource, and then one of its Database resources, the deletion of the Database never finishes.

To Reproduce
Steps to reproduce the behavior:

Create an Instance with a Database as follows

---
apiVersion: oracle.db.anthosapis.com/v1alpha1
kind: Instance
metadata:
  name: mydb
spec:
  cdbName: CDB
  cloudProvider: GCP
  databaseResources:
    requests:
      memory: 4.0Gi
  dbDomain: gke
  disks:
    - name: DataDisk
      size: 45Gi
      storageClass: "standard-rwo"
    - name: LogDisk
      size: 55Gi
      storageClass: "standard-rwo"
  edition: Enterprise
  images:
    service: gcr.io/...
  services:
    Backup: true
    Monitoring: true
    Logging: true
  sourceCidrRanges: [ 0.0.0.0/0 ]
  type: Oracle
  version: "19.15"
---
apiVersion: oracle.db.anthosapis.com/v1alpha1
kind: Database
metadata:
  name: mydb
spec:
  instance: mydb
  name: MY_PDB
  admin_password: ...

(update image and password above)

Wait until the Database is Ready.

$ kubectl get databases.oracle.db.anthosapis.com
NAME   INSTANCE   USERS   PHASE   DATABASEREADYSTATUS   DATABASEREADYREASON   USERREADYSTATUS   USERREADYREASON
mydb   mydb               Ready   True                  CreateComplete        True              SyncComplete

Delete the Instance

$ kubectl delete instances.oracle.db.anthosapis.com mydb
instance.oracle.db.anthosapis.com "mydb" deleted

Delete the Database

$ kubectl delete databases.oracle.db.anthosapis.com mydb
database.oracle.db.anthosapis.com "mydb" deleted
^C

The deletion of the database hangs after the message is printed. Operator logs show repeated messages

I1020 06:56:02.378181       1 database_controller.go:129] controllers/Database "msg"="reconciling Database (PDB) deletion..." "Database"={"Namespace":"default","Name":"mydb"}
E1020 06:56:02.379075       1 controller.go:326]  "msg"="Reconciler error" "error"="Instance.oracle.db.anthosapis.com \"mydb\" not found" "controller"="database" "controllerGroup"="oracle.db.anthosapis.com" "controllerKind"="Database" "database"={"name":"mydb","namespace":"default"} "name"="mydb" "namespace"="default" "reconcileID"="ee938527-c9a0-4a3e-ac06-fef33e2869b5"

Expected behavior
My expectation is that since the Instance deletes the underlying CDB altogether, the Database deletion should be immediate. This is even more important when using Helm to install & uninstall a release containing El Carro resources, since Helm does not guarantee any specific order in this case.

Even though I constructed the above minimal case for reproducing the issue with kubectl, the original problem appeared when using helm.

Additional context
Kubernetes versions:

kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.15", GitCommit:"1d79bc3bcccfba7466c44cc2055d6e7442e140ea", GitTreeState:"clean", BuildDate:"2022-09-21T12:11:27Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.12-gke.2300", GitCommit:"e55564cf3a1384026a54920174977659c8c56a50", GitTreeState:"clean", BuildDate:"2022-08-16T09:24:51Z", GoVersion:"go1.16.15b7", Compiler:"gc", Platform:"linux/amd64"}

Workarounds are either:

use the "correct" deletion order: first Databases, then Instances
or:
if the Database hangs and cannot be deleted, remove its metadata.finalizers

Answer 1 · 2022-10-20T20:07:33.000Z

I just want to expand on the expectation. I would expect that deleting the instance would either propagate down all the downstream objects much like deleting a deployment will also clean up pods and services, etc.

Or we should make it so that a database could be kept by itself. We could unplug it and leave it in a state in which we were able to attach it to another instance, for example.

Answer 2 · 2022-10-24T17:21:31.000Z

Unplugging a database can take a lot of time for the large ones, so I don't think it should be done unless the user is explicitly requesting it.

Answer 3 · 2022-10-28T19:52:50.000Z

I like the idea of propagating deletion down to all the database objects attached to Instance. I will have PR out for that shortly.