Database deletion hangs if the Instance has been deleted first (e.g. via Helm)
nblxa opened this issue · 3 comments
Describe the bug
If I first delete the Instance resource, and then one of its Database resources, the deletion of the Database never finishes.
To Reproduce
Steps to reproduce the behavior:
- Create an Instance with a Database as follows
---
apiVersion: oracle.db.anthosapis.com/v1alpha1
kind: Instance
metadata:
name: mydb
spec:
cdbName: CDB
cloudProvider: GCP
databaseResources:
requests:
memory: 4.0Gi
dbDomain: gke
disks:
- name: DataDisk
size: 45Gi
storageClass: "standard-rwo"
- name: LogDisk
size: 55Gi
storageClass: "standard-rwo"
edition: Enterprise
images:
service: gcr.io/...
services:
Backup: true
Monitoring: true
Logging: true
sourceCidrRanges: [ 0.0.0.0/0 ]
type: Oracle
version: "19.15"
---
apiVersion: oracle.db.anthosapis.com/v1alpha1
kind: Database
metadata:
name: mydb
spec:
instance: mydb
name: MY_PDB
admin_password: ...
(update image and password above)
Wait until the Database is Ready.
$ kubectl get databases.oracle.db.anthosapis.com
NAME INSTANCE USERS PHASE DATABASEREADYSTATUS DATABASEREADYREASON USERREADYSTATUS USERREADYREASON
mydb mydb Ready True CreateComplete True SyncComplete
- Delete the Instance
$ kubectl delete instances.oracle.db.anthosapis.com mydb
instance.oracle.db.anthosapis.com "mydb" deleted
- Delete the Database
$ kubectl delete databases.oracle.db.anthosapis.com mydb
database.oracle.db.anthosapis.com "mydb" deleted
^C
The deletion of the database hangs after the message is printed. Operator logs show repeated messages
I1020 06:56:02.378181 1 database_controller.go:129] controllers/Database "msg"="reconciling Database (PDB) deletion..." "Database"={"Namespace":"default","Name":"mydb"}
E1020 06:56:02.379075 1 controller.go:326] "msg"="Reconciler error" "error"="Instance.oracle.db.anthosapis.com \"mydb\" not found" "controller"="database" "controllerGroup"="oracle.db.anthosapis.com" "controllerKind"="Database" "database"={"name":"mydb","namespace":"default"} "name"="mydb" "namespace"="default" "reconcileID"="ee938527-c9a0-4a3e-ac06-fef33e2869b5"
Expected behavior
My expectation is that since the Instance deletes the underlying CDB altogether, the Database deletion should be immediate. This is even more important when using Helm to install & uninstall a release containing El Carro resources, since Helm does not guarantee any specific order in this case.
Even though I constructed the above minimal case for reproducing the issue with kubectl
, the original problem appeared when using helm
.
Additional context
Kubernetes versions:
kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.15", GitCommit:"1d79bc3bcccfba7466c44cc2055d6e7442e140ea", GitTreeState:"clean", BuildDate:"2022-09-21T12:11:27Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.12-gke.2300", GitCommit:"e55564cf3a1384026a54920174977659c8c56a50", GitTreeState:"clean", BuildDate:"2022-08-16T09:24:51Z", GoVersion:"go1.16.15b7", Compiler:"gc", Platform:"linux/amd64"}
Workarounds are either:
- use the "correct" deletion order: first Databases, then Instances
or: - if the Database hangs and cannot be deleted, remove its
metadata.finalizers
I just want to expand on the expectation. I would expect that deleting the instance would either propagate down all the downstream objects much like deleting a deployment will also clean up pods and services, etc.
Or we should make it so that a database could be kept by itself. We could unplug it and leave it in a state in which we were able to attach it to another instance, for example.
Unplugging a database can take a lot of time for the large ones, so I don't think it should be done unless the user is explicitly requesting it.
I like the idea of propagating deletion down to all the database objects attached to Instance. I will have PR out for that shortly.