GoogleCloudPlatform/elcarro-oracle-operator

Unable to build operator docker image. stat oracle/pkg/database/common: file does not exist

urbanchef opened this issue ยท 19 comments

Describe the bug
Unable to build operator docker image locally

To Reproduce

cd $PATH_TO_EL_CARRO_REPO
{
export REPO="localhost:5000/oracle.db.anthosapis.com"
export TAG="latest"
export OPERATOR_IMG="${REPO}/operator:${TAG}"
docker build -f oracle/Dockerfile -t ${OPERATOR_IMG} .
docker push ${OPERATOR_IMG}
}
Sending build context to Docker daemon   4.71MB
Step 1/19 : FROM docker.io/golang:1.15 as builder
 ---> 40349a2425ef
Step 2/19 : WORKDIR /build
 ---> Using cache
 ---> b44c2a87f722
Step 3/19 : COPY go.mod go.mod
 ---> Using cache
 ---> c359cdfe04b9
Step 4/19 : COPY go.sum go.sum
 ---> Using cache
 ---> 6f6d2902ef22
Step 5/19 : RUN go mod download
 ---> Using cache
 ---> 8be558325755
Step 6/19 : COPY common common
 ---> Using cache
 ---> 1dd64c7bfbc5
Step 7/19 : COPY oracle/main.go oracle/main.go
 ---> Using cache
 ---> 0a79c9d91f73
Step 8/19 : COPY oracle/version.go oracle/version.go
 ---> Using cache
 ---> a9fbca9b14cf
Step 9/19 : COPY oracle/api/ oracle/api/
 ---> Using cache
 ---> 123c5e7c856e
Step 10/19 : COPY oracle/controllers/ oracle/controllers/
 ---> Using cache
 ---> 7c7a1ff96c61
Step 11/19 : COPY oracle/pkg/agents oracle/pkg/agents
 ---> Using cache
 ---> 9d5ed5ea3f52
Step 12/19 : COPY oracle/pkg/database/common oracle/pkg/database/common
COPY failed: file not found in build context or excluded by .dockerignore: stat oracle/pkg/database/common: file does not exist

Expected behavior
docker build finishes successfully

Thank you for creating this issue. We are looking into it and will reply shortly.

The easiest way would be to make use of the published release:

https://github.com/GoogleCloudPlatform/elcarro-oracle-operator/releases

For building from the source:

1/ If you haven't tried it already, please see if the following steps help in building an image:

https://github.com/GoogleCloudPlatform/elcarro-oracle-operator/blob/main/docs/content/provision/image.md#building-a-containerized-oracle-database-image-locally-using-docker

Or use the Oracle published image.

2/ Once a container image is ready, to build an Operator from source it may be easier to use the bazel rules in favor of the Dockerfile. We are looking into this now...

Sorry about that, the dockerfiles for agent and operator images were not kept up to date. We recommend using bazel for all images except the database image and will update the documentation with some instructions. For now you can build and import into your local docker repository with the following steps.

Find the docker rule for the image you want to build (Check our buildah-push-* oracle/Makefile rules for some of the build push targets, or you can run bazel query 'kind(container_image, //...)' to list them.

Build the container image with bazel build //oracle:operator_image, once complete it will tell you were the container tar was placed.

Import the tarfile into your local docker repository docker import bazel-bin/oracle/operator_image-layer.tar ${OPERATOR_IMG}

I hope that helps.

The easiest way would be to make use of the published release:

https://github.com/GoogleCloudPlatform/elcarro-oracle-operator/releases

For building from the source:

1/ If you haven't tried it already, please see if the following steps help in building an image:

https://github.com/GoogleCloudPlatform/elcarro-oracle-operator/blob/main/docs/content/provision/image.md#building-a-containerized-oracle-database-image-locally-using-docker

Or use the Oracle published image.

2/ Once a container image is ready, to build an Operator from source it may be easier to use the bazel rules in favor of the Dockerfile. We are looking into this now...

The easiest way would be to make use of the published release:

What do you mean by that? Current release does not contain the operator image I am trying to build...

For building from the source:

I already have Oracle db image built and ready - there is no issue with this.

Sorry about that, the dockerfiles for agent and operator images were not kept up to date. We recommend using bazel for all images except the database image and will update the documentation with some instructions. For now you can build and import into your local docker repository with the following steps.

Find the docker rule for the image you want to build (Check our buildah-push-* oracle/Makefile rules for some of the build push targets, or you can run bazel query 'kind(container_image, //...)' to list them.

Build the container image with bazel build //oracle:operator_image, once complete it will tell you were the container tar was placed.

Import the tarfile into your local docker repository docker import bazel-bin/oracle/operator_image-layer.tar ${OPERATOR_IMG}

I hope that helps.

I have done the following:

{
export REPO="localhost:5000/oracle.db.anthosapis.com"
export TAG="latest"
export OPERATOR_IMG="${REPO}/operator:${TAG}"
bazel build //oracle:operator_image
sudo docker import bazel-bin/oracle/operator_image-layer.tar ${OPERATOR_IMG}
sudo docker push ${OPERATOR_IMG}
}



{
export DBINIT_IMG="${REPO}/dbinit:${TAG}"
bazel build //oracle/build:dbinit
sudo docker import bazel-bin/oracle/build/dbinit-layer.tar ${DBINIT_IMG} 
sudo docker push ${DBINIT_IMG}
}


{
export CONFIG_AGENT_IMG="${REPO}/configagent:${TAG}"
bazel build //oracle/build:configagent
sudo docker import bazel-bin/oracle/build/configagent-layer.tar ${CONFIG_AGENT_IMG}
sudo docker push ${CONFIG_AGENT_IMG}
}


{
export LOGGING_IMG="${REPO}/loggingsidecar:${TAG}"
bazel build //oracle/build:loggingsidecar
sudo docker import bazel-bin/oracle/build/loggingsidecar-layer.tar ${LOGGING_IMG}
sudo docker push ${LOGGING_IMG}
}

{
export MONITORING_IMG="${REPO}/monitoring:${TAG}"
bazel build //oracle/build:monitoring
sudo docker import bazel-bin/oracle/build/monitoring-layer.tar ${MONITORING_IMG}
sudo docker push ${MONITORING_IMG}
}
sed -i 's/image: gcr.*oracle.db.anthosapis.com/image: localhost:5000\/oracle.db.anthosapis.com/g' $PATH_TO_EL_CARRO_REPO/oracle/operator.yaml
kubectl apply -f $PATH_TO_EL_CARRO_REPO/oracle/operator.yaml

But get an error:

$ kubectl get pods -A
NAMESPACE         NAME                                           READY   STATUS             RESTARTS        AGE
kube-system       coredns-78fcd69978-2fgpx                       1/1     Running            0               42m
kube-system       csi-hostpath-attacher-0                        1/1     Running            0               42m
kube-system       csi-hostpath-provisioner-0                     1/1     Running            0               42m
kube-system       csi-hostpath-resizer-0                         1/1     Running            0               42m
kube-system       csi-hostpath-snapshotter-0                     1/1     Running            0               42m
kube-system       csi-hostpathplugin-0                           5/5     Running            0               42m
kube-system       etcd-minikube                                  1/1     Running            0               42m
kube-system       kube-apiserver-minikube                        1/1     Running            0               42m
kube-system       kube-controller-manager-minikube               1/1     Running            0               42m
kube-system       kube-proxy-6pkqm                               1/1     Running            0               42m
kube-system       kube-scheduler-minikube                        1/1     Running            0               42m
kube-system       registry-proxy-7zfkz                           1/1     Running            0               41m
kube-system       registry-w9mkj                                 1/1     Running            0               41m
kube-system       snapshot-controller-989f9ddc8-2hrgb            1/1     Running            0               41m
kube-system       snapshot-controller-989f9ddc8-7zmhc            1/1     Running            0               41m
kube-system       storage-provisioner                            1/1     Running            0               42m
operator-system   operator-controller-manager-656dbb8b88-xtd9s   1/2     CrashLoopBackOff   6 (2m53s ago)   8m38s
$ kubectl describe pod/operator-controller-manager-656dbb8b88-xtd9s -n operator-system
Name:         operator-controller-manager-656dbb8b88-xtd9s
Namespace:    operator-system
Priority:     0
Node:         minikube/192.168.99.106
Start Time:   Thu, 30 Sep 2021 11:57:17 +0300
Labels:       control-plane=controller-manager
              pod-template-hash=656dbb8b88
Annotations:  <none>
Status:       Running
IP:           172.17.0.12
IPs:
  IP:           172.17.0.12
Controlled By:  ReplicaSet/operator-controller-manager-656dbb8b88
Containers:
  kube-rbac-proxy:
    Container ID:  docker://949857f806cff32d80c41bf5697aa26ae224b5a09f550416dcdaed15e9fb75a0
    Image:         gcr.io/kubebuilder/kube-rbac-proxy:v0.4.1
    Image ID:      docker-pullable://gcr.io/kubebuilder/kube-rbac-proxy@sha256:6c915d948d4781d366300d6e75d67a7830a941f078319f0fecc21c7744053eff
    Port:          8443/TCP
    Host Port:     0/TCP
    Args:
      --secure-listen-address=0.0.0.0:8443
      --upstream=http://127.0.0.1:8080/
      --logtostderr=true
      --v=10
    State:          Running
      Started:      Thu, 30 Sep 2021 11:57:22 +0300
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lf7vv (ro)
  manager:
    Container ID:  docker://e0758f47dce1eed8a884bdf67137ffeeb08a118342444b2cf7d89b9668b1a115
    Image:         localhost:5000/oracle.db.anthosapis.com/operator:latest
    Image ID:      docker-pullable://localhost:5000/oracle.db.anthosapis.com/operator@sha256:ae6792ed615b99e87a60a013f5066ca4456c47ee84122856bd5d01c182cc4089
    Port:          <none>
    Host Port:     <none>
    Command:
      /manager
    Args:
      --metrics-addr=127.0.0.1:8080
      --enable-leader-election
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 30 Sep 2021 12:08:13 +0300
      Finished:     Thu, 30 Sep 2021 12:08:13 +0300
    Ready:          False
    Restart Count:  7
    Limits:
      cpu:     100m
      memory:  40Mi
    Requests:
      cpu:        100m
      memory:     30Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lf7vv (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-lf7vv:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  11m                 default-scheduler  Successfully assigned operator-system/operator-controller-manager-656dbb8b88-xtd9s to minikube
  Normal   Pulling    11m                 kubelet            Pulling image "gcr.io/kubebuilder/kube-rbac-proxy:v0.4.1"
  Normal   Pulled     11m                 kubelet            Successfully pulled image "gcr.io/kubebuilder/kube-rbac-proxy:v0.4.1" in 3.123464154s
  Normal   Created    11m                 kubelet            Created container kube-rbac-proxy
  Normal   Started    11m                 kubelet            Started container kube-rbac-proxy
  Normal   Pulled     11m                 kubelet            Successfully pulled image "localhost:5000/oracle.db.anthosapis.com/operator:latest" in 1.066361799s
  Normal   Pulled     11m                 kubelet            Successfully pulled image "localhost:5000/oracle.db.anthosapis.com/operator:latest" in 38.532731ms
  Normal   Pulled     11m                 kubelet            Successfully pulled image "localhost:5000/oracle.db.anthosapis.com/operator:latest" in 86.936261ms
  Normal   Pulling    10m (x4 over 11m)   kubelet            Pulling image "localhost:5000/oracle.db.anthosapis.com/operator:latest"
  Normal   Created    10m (x4 over 11m)   kubelet            Created container manager
  Normal   Started    10m (x4 over 11m)   kubelet            Started container manager
  Normal   Pulled     10m                 kubelet            Successfully pulled image "localhost:5000/oracle.db.anthosapis.com/operator:latest" in 49.27441ms
  Warning  BackOff    94s (x48 over 11m)  kubelet            Back-off restarting failed container
$ kubectl logs operator-controller-manager-656dbb8b88-xtd9s -n operator-system manager
standard_init_linux.go:228: exec user process caused: no such file or directory

The easiest way would be to make use of the published release:
https://github.com/GoogleCloudPlatform/elcarro-oracle-operator/releases
...

All prebuilt images (other than the database image that customers need to build by themselves) are publicly available in the elcarro repo in GCR. If you have a GCP account, this is the fastest way to get El Carro up and running:

$ gcloud container images list --repository gcr.io/elcarro/oracle.db.anthosapis.com
NAME
gcr.io/elcarro/oracle.db.anthosapis.com/configagent
gcr.io/elcarro/oracle.db.anthosapis.com/dbinit
gcr.io/elcarro/oracle.db.anthosapis.com/loggingsidecar
gcr.io/elcarro/oracle.db.anthosapis.com/monitoring
gcr.io/elcarro/oracle.db.anthosapis.com/operator
gcr.io/elcarro/oracle.db.anthosapis.com/ui

For instance, here are the prebuilt Operator and the Config Agent images:

$ gcloud container images list-tags gcr.io/elcarro/oracle.db.anthosapis.com/operator
DIGEST        TAGS          TIMESTAMP
029c0edd85fe  v0.0.0-alpha  1969-12-31T19:00:00
182c03e1c8c6  create_table  1969-12-31T19:00:00
ba3d4d8f5ba6  v0.1.0-alpha  1969-12-31T19:00:00
d9ea2b20d4d2  bare          1969-12-31T19:00:00
 
$ gcloud container images list-tags gcr.io/elcarro/oracle.db.anthosapis.com/configagent
DIGEST        TAGS          TIMESTAMP
5476c1199f7c  v0.1.0-alpha  1969-12-31T19:00:00
73982731b632  bare          1969-12-31T19:00:00
74002bdaeeee  create_table  1969-12-31T19:00:00
753c0dd3d516                1969-12-31T19:00:00
c4dac3bb4b3e  v0.0.0-alpha  1969-12-31T19:00:00

We presently recommend using images tagged as v0.1.0-alpha and so pulling an Operator image may resemble the following:

$ docker pull gcr.io/elcarro/oracle.db.anthosapis.com/operator:v0.1.0-alpha
v0.1.0-alpha: Pulling from elcarro/oracle.db.anthosapis.com/operator
5dea5ec2316d: Pull complete
bb771d6dc9a1: Pull complete
6f78ac1091ad: Pull complete
3c2cba919283: Pull complete
d1cd9fcf89ba: Pull complete
Digest: sha256:ba3d4d8f5ba61163bd5b5d083c8876782c476c4d6ef03cd7803ceac2274eb065
Status: Downloaded newer image for gcr.io/elcarro/oracle.db.anthosapis.com/operator:v0.1.0-alpha
gcr.io/elcarro/oracle.db.anthosapis.com/operator:v0.1.0-alpha

And so the easiest way to start with El Carro is to download our latest release (presently v0.1.0-alpha, see [1]) and apply the operator.yaml, which would then take care of downloading the pre-built images "on the fly" (as long as you have access to GCP):

$ grep 'image: gcr.io/elcarro' operator.yaml
        image: gcr.io/elcarro/oracle.db.anthosapis.com/operator:v0.1.0-alpha

This is not to say that we discourage building images from the source (we certainly don't!). And thank you for identifying a bug in our Dockerfile, but we've moved away from them in favor of Bazel builds and so, as Kurt pointed out, the Dockerfiles are not actively maintained anymore.

[1] https://github.com/GoogleCloudPlatform/elcarro-oracle-operator/releases

But get an error:

My instructions previously were incorrect, the correct bazel target for importing into docker is bazel build //oracle:operator_image.tar which should produce bazel-bin/oracle/operator_image.tar. Otherwise the steps you did look good.

I did some cleanup and of our build scripts and the associated documentation. If you can give it a try again and let me know if there are any issues, otherwise we can close this bug out.

@kurt-google Hi! I re-run steps from minikube guide and it still does not work:

2021/10/13 01:14:26 Destination gcr.io/{PROW_PROJECT}/oracle.db.anthosapis.com/dbinit:{PROW_IMAGE_TAG} was resolved to gcr.io/local/oracle.db.anthosapis.com/dbinit:latest after stamping. 2021/10/13 01:14:26 Error pushing image to gcr.io/local/oracle.db.anthosapis.com/dbinit:latest: unable to push image to gcr.io/local/oracle.db.anthosapis.com/dbinit:latest: GET https://gcr.io/v2/token?scope=repository%3Alocal%2Foracle.db.anthosapis.com%2Fdbinit%3Apush%2Cpull&service=gcr.io: UNKNOWN: Service 'containerregistry.googleapis.com' is not enabled for consumer 'project:local'. make[1]: *** [Makefile:165: buildah-push-dbinit] Error 1 make[1]: Leaving directory '/media/rmushchinskiy/misc/elcarro-oracle-operator/oracle' make: *** [Makefile:192: deploy] Error 2 make: Leaving directory '/media/rmushchinskiy/misc/elcarro-oracle-operator/oracle'

You will need to pull down the latest changes in the repository. Part of the PR above also modifies how images are deployed to enable local repositories. The log shows its still on an old commit using the hardcoded gcr.io repositories.

Couldn't get it to work in minikube... wondering what have I done wrong? Db status is forever stuck with "CreateInProgress" message.
However, images were built successfully using new instructions.

$ kubectl get instances.oracle.db.anthosapis.com -n $NS -w
NAME   DB ENGINE   VERSION   EDITION      ENDPOINT      URL   DB NAMES   BACKUP ID   READYSTATUS   READYREASON        DBREADYSTATUS   DBREADYREASON
orcl   Oracle      19.3      Enterprise   orcl-svc.db                                False         CreateInProgress                   

Couldn't get it to work in minikube... wondering what have I done wrong? Db status is forever stuck with "CreateInProgress" message. However, images were built successfully using new instructions.

$ kubectl get instances.oracle.db.anthosapis.com -n $NS -w
NAME   DB ENGINE   VERSION   EDITION      ENDPOINT      URL   DB NAMES   BACKUP ID   READYSTATUS   READYREASON        DBREADYSTATUS   DBREADYREASON
orcl   Oracle      19.3      Enterprise   orcl-svc.db                                False         CreateInProgress                   

When creating a seeded image locally, Container database is created with the following command:

create_cdb() {
  local syspass="$(openssl rand -base64 16 | tr -dc a-zA-Z0-9)"
  sudo -u oracle "${OHOME}/bin/dbca" \
    -silent \
    -createDatabase \
    -templateName General_Purpose.dbc \
    -gdbname "${CDB_NAME}" \
    -createAsContainerDatabase true \
    -sid "${CDB_NAME}" \
    -responseFile NO_VALUE \
    -characterSet "${CHARACTER_SET}" \
    -memoryPercentage "${MEM_PCT}" \
    -emConfiguration NONE \
    -datafileDestination "/u01/app/oracle/oradata" \
    -storageType FS \
    -initParams "${INIT_PARAMS}" \
    -databaseType MULTIPURPOSE \
    -recoveryAreaDestination /u01/app/oracle/fast_recovery_area \
    -sysPassword "${syspass}" \
    -systemPassword "${syspass}"
}

Note: -datafileDestination "/u01/app/oracle/oradata"

Datafiles are created under /u01, whereas operator expects them to be under DataMount, which is /u02

	// DataDir is the directory where datafiles exists.
	DataDir = "/%s/app/oracle/oradata/%s"
	// DataMount is the PD mount where the data is persisted.
	DataMount = "u02"

Hence, operator thinks that an image does not contain a database

func createImageTypeFile() error {
	_, cdbNameFromImage, _, err := provision.FetchMetaDataFromImage()
	if err != nil {
		return fmt.Errorf("could not fetch metadata from service image: %v", err)
	}

	var fileName string
	if cdbNameFromImage == "" {
		fileName = consts.UnseededImageFile
		klog.Info("dbdaemonproxy/createImageTypeFile: detected an unseeded image")
	} else {
		fileName = consts.SeededImageFile
		klog.Info("dbdaemonproxy/createImageTypeFile: detected a seeded image")
	}

	f, err := os.Create(fileName)
	if err != nil {
		return fmt.Errorf("could not create %s file: %v", fileName, err)
	}
	defer f.Close()

	return nil
}

Output from my local minikube instance: detected an unseeded image, whereas actually it is seeded

$ kubectl logs orcl-sts-0 oracledb -n db
I1013 11:08:52.219915     162 dbdaemon_proxy.go:69] "dbdaemonproxy/userCheck" group="dba" g=&{Gid:54322 Name:dba}
I1013 11:08:52.219979     162 dbdaemon_proxy.go:69] "dbdaemonproxy/userCheck" group="oinstall" g=&{Gid:54321 Name:oinstall}
I1013 11:08:52.220082     162 dbdaemon_proxy.go:113] dbdaemonproxy/createImageTypeFile: detected an unseeded image
I1013 11:08:52.220268     162 dbdaemon_proxy.go:487] Initializing environment for Oracle...
I1013 11:08:52.220597     162 dbdaemon_proxy.go:185] "Starting a Database Daemon Proxy..." host="orcl-sts-0" address="/var/tmp/dbdaemon_proxy.sock"

But first we need to check if datafiles are located under $ORACLE_BASE. I think the problem here is os.Getenv("ORACLE_BASE") + "/oradata/" + os.Getenv("ORACLE_SID") string transforming into /u01/app/oracle/oradata/orcl, where in reality it is /u01/app/oracle/oradata/ORCL

[oracle@orcl-sts-0 /]$ echo $ORACLE_BASE/oradata/$ORACLE_SID
/u01/app/oracle/oradata/orcl
[oracle@orcl-sts-0 /]$ ls -ld $ORACLE_BASE/oradata/$ORACLE_SID
ls: cannot access /u01/app/oracle/oradata/orcl: No such file or directory
[oracle@orcl-sts-0 /]$ ls -ld /u01/app/oracle/oradata/ORCL
drwxr-x--- 3 oracle dba 4096 Sep 16 14:42 /u01/app/oracle/oradata/ORCL
func FetchMetaDataFromImage() (oracleHome, cdbName, version string, err error) {
	if os.Getenv("ORACLE_SID") != "" {
		cdbName = os.Getenv("ORACLE_SID")
		//the existence of the ORACLE_SID env variable isn't enough to conclude that a CDB of that name exists
		//The existence of an oradata directory containing ORACLE_SID confirms the existence of a CDB of that name
		if _, err = os.Stat(os.Getenv("ORACLE_BASE") + "/oradata/" + os.Getenv("ORACLE_SID")); os.IsNotExist(err) {
			//After a database is provisioned, the oradata directory will be located on the DataMount
			if _, err = os.Stat(fmt.Sprintf(consts.DataDir, consts.DataMount, os.Getenv("ORACLE_SID"))); os.IsNotExist(err) {
				cdbName = ""
			}
		}
	}
	return os.Getenv("ORACLE_HOME"), cdbName, getOracleVersionUsingOracleHome(os.Getenv("ORACLE_HOME")), nil
}

Interesting, I think the issue here is that the CDB name got passed through without being capitalized. As you noted the directory can be lower case if you pass in lower case to oracle's tooling, but for now we expect everything to be capital and do the conversion for you in most cases. Thanks for the find, can you try with an all caps CDB name?

I have made some progress by rebuilding my docker image with the following command:
Note: cdb_name value is UPPERCASE

sudo ./image_build.sh --local_build=true --db_version=19.3 --patch_version=32904851 --create_cdb=true --cdb_name=ORCL --mem_pct=50 --no_dry_run --project_id=local-build

This allowed me to move past the previous issue. But unfotunately, new one came up: when provisioning a CDB getting the following error:

oracledb I1014 17:08:54.701279     185 bootstrap_database_task.go:129] "bouncing database for setting parameters"                                                                                                                          โ”‚
โ”‚ oracledb E1014 17:09:18.636115     185 common.go:188] "Subtask failed" err="setParameters: force startup nomount failed: rpc error: code = Unknown desc = dbdaemon/BounceDatabase: error while backing up config file: err: BackupConfigFi โ”‚
โ”‚ le: failed to create pfile due to error: ORA-01565: error in identifying file '/u01/app/oracle/product/19.3/db/dbs/spfileORCL.ora'\nORA-27041: unable to open file\nLinux-x86_64 Error: 22: Invalid argument\nAdditional information: 2" p โ”‚
โ”‚ arent task="Bootstrap" sub task="setParameters"                                                                                                                                                                                            โ”‚
โ”‚ oracledb E1014 17:09:18.636221     185 init_oracle.go:212] "CDB provisioning failed" err="failed to bootstrap database : setParameters: force startup nomount failed: rpc error: code = Unknown desc = dbdaemon/BounceDatabase: error whil โ”‚
โ”‚ e backing up config file: err: BackupConfigFile: failed to create pfile due to error: ORA-01565: error in identifying file '/u01/app/oracle/product/19.3/db/dbs/spfileORCL.ora'\nORA-27041: unable to open file\nLinux-x86_64 Error: 22: I โ”‚
โ”‚ nvalid argument\nAdditional information: 2"                                                                                                                                                                                                โ”‚
โ”‚ oracledb I1014 17:09:18.637651     161 dbdaemon_proxy.go:398] "proxy/ProxyRunInitOracle: FAIL"

It is strange, because the above mentioned spfile do exist:

[oracle@orcl-sts-0 /]$ cd $ORACLE_HOME/dbs
[oracle@orcl-sts-0 dbs]$ ls -ltr
total 12
-rw-r--r-- 1 oracle dba 3079 May 14  2015 init.ora
-rw-r----- 1 oracle dba   24 Oct 14 10:02 lkORCL
lrwxrwxrwx 1 oracle dba   45 Oct 14 17:08 spfileORCL.ora -> /u02/app/oracle/oraconfig/ORCL/spfileORCL.ora
lrwxrwxrwx 1 oracle dba   40 Oct 14 17:08 orapwORCL -> /u02/app/oracle/oraconfig/ORCL/orapwORCL
-rw-rw---- 1 oracle dba 1544 Oct 14 17:13 hc_ORCL.dat
[oracle@orcl-sts-0 ORCL]$ ls -l $ORACLE_HOME/dbs/spfileORCL.ora
lrwxrwxrwx 1 oracle dba 45 Oct 14 17:08 /u01/app/oracle/product/19.3/db/dbs/spfileORCL.ora -> /u02/app/oracle/oraconfig/ORCL/spfileORCL.ora

I have attached the full log:
db-orcl-sts-0-1634232063881717513.log

It might be related to the other container in the DB pod, can you also help share logs and ls -l $ORACLE_HOME/dbs/ for the container dbdaemon ?

Based on the log, other images seem not built with the latest code, but the db image was built with the latest code. Perhaps try to build other images with the latest code.

@haohu-hh

Based on the log, other images seem not built with the latest code, but the db image was built with the latest code. Perhaps try to build other images with the latest code.

Could you please elaborate on that?

UPD.: oh, I see, so do you mean that db image was build with the latest el-carro code updates, while other images such as operator, dbinit, configagent, loggingsidecar weren't?

Let me try

UPD.: oh, I see, so do you mean that db image was build with the latest el-carro code updates, while other images such as operator, dbinit, configagent, loggingsidecar weren't?

Let me try

Yes, operator, dbinit, configagent, loggingsidecar images.

We can use
https://github.com/GoogleCloudPlatform/elcarro-oracle-operator/blob/main/docs/content/minikube.md#build-and-push-the-el-carro-operator-and-agent-images

Though I am not sure if this is the root cause, if it fails again, it will be helpful to include operator, config agent, dbdaemon, oracledb container logs.

Thanks for your patience and thanks for trying elcarro-oracle-operator.

@urbanchef any luck on getting the operator docker image build to work?

Closing this due to lack of response.