ory/keto

Helm Release Job NotReady Status

sp71 opened this issue · 1 comments

sp71 commented

Preflight checklist

Ory Network Project

No response

Describe the bug

When bringing up keto in terraform using the helm_release resource with autoMigration enabled, the job's pod is always set to NotReady despite the logs from the jobs pod indicating that the migration was applied correctly. I verified the database had all the changes committed to it correctly. Any ideas why the job's pod is always set to NotReady? I am using the cloudSQL proxy as the side car container.

Reproducing the bug

Steps to reproduce the behavior:

  1. Apply terraform
  2. See keto's job pod status set to NotReady

Relevant log output

Jobs Pod Logs

time=2023-09-10T12:12:40Z level=error msg=Unable to ping the database connection, retrying. audience=application error=map[message:failed to connect to `host=127.0.0.1 user=postgres database=`: dial error (dial tcp 127.0.0.1:5432: connect: connection refused)] service_name=Ory Keto service_version=v0.11.1-alpha.0
[POP] 2023/09/10 12:12:47 warn - One or more of connection details are specified in database.yml. Override them with values in URL.
time=2023-09-10T12:12:47Z level=info msg=No tracer configured - skipping tracing setup audience=application service_name=Ory Keto service_version=v0.11.1-alpha.0
Current status:
Version			Name					Status
20150100000001000000	networks				Pending
20201110175414000000	relationtuple				Pending
20201110175414000001	relationtuple				Pending
20210623162417000000	relationtuple				Pending
20210623162417000001	relationtuple				Pending
20210623162417000002	relationtuple				Pending
20210623162417000003	relationtuple				Pending
20210914134624000000	legacy-cleanup				Pending
20220217152313000000	nid_fk					Pending
20220512151000000000	indices					Pending
20220513200300000000	create-intermediary-uuid-table		Pending
20220513200400000000	create-uuid-mapping-table		Pending
20220513200400000001	uuid-mapping-remove-check		Pending
20220513200500000000	migrate-strings-to-uuids		Pending
20220513200600000000	drop-old-non-uuid-table			Pending
20220513200600000001	drop-old-non-uuid-table			Pending
20230228091200000000	add-on-delete-cascade-to-relationship	Pending
Applying migrations...
Successfully applied all migrations:
Version			Name					Status
20150100000001000000	networks				Applied
20201110175414000000	relationtuple				Applied
20201110175414000001	relationtuple				Applied
20210623162417000000	relationtuple				Applied
20210623162417000001	relationtuple				Applied
20210623162417000002	relationtuple				Applied
20210623162417000003	relationtuple				Applied
20210914134624000000	legacy-cleanup				Applied
20220217152313000000	nid_fk					Applied
20220512151000000000	indices					Applied
20220513200300000000	create-intermediary-uuid-table		Applied
20220513200400000000	create-uuid-mapping-table		Applied
20220513200400000001	uuid-mapping-remove-check		Applied
20220513200500000000	migrate-strings-to-uuids		Applied
20220513200600000000	drop-old-non-uuid-table			Applied
20220513200600000001	drop-old-non-uuid-table			Applied
20230228091200000000	add-on-delete-cascade-to-relationship	Applied


### Relevant configuration

```yml
resource "helm_release" "keto" {
  name       = "ory"
  repository = "https://k8s.ory.sh/helm/charts"
  chart      = "keto"

  values = [
    <<EOT
    serviceAccount:
      create: false
      name: ${module.service_account.value.id}
    job:
      serviceAccount:
        create: false
        name: ${module.service_account.value.id}
      extraContainers: |
        - name: cloud-sql-proxy
          image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.6.1
          imagePullPolicy: Always
          args:
          - "--structured-logs"
          - "--health-check"
          - "--http-address=0.0.0.0"
          - "--port=${local.sql_port}"
          - "--private-ip"
          - ${var.project_id}:${var.default_region}:${module.sql_db.name}
          securityContext:
            runAsNonRoot: true
            readOnlyRootFilesystem: true
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - ALL
          livenessProbe:
            httpGet:
              path: /liveness
              port: 9090
            initialDelaySeconds: 0
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 2
          readinessProbe:
            httpGet:
              path: /readiness
              port: 9090
            initialDelaySeconds: 0
            periodSeconds: 10
            timeoutSeconds: 5
            successThreshold: 1
            failureThreshold: 2
          startupProbe:
            httpGet:
              path: /startup
              port: 9090
            periodSeconds: 1
            timeoutSeconds: 5
            failureThreshold: 20
          resources:
            requests:
              memory: 128Mi
              cpu: 50m
            limits:
              memory: 512Mi
              cpu: 250m
    keto:
      automigration:
        enabled: true
      config:
        dsn: postgres://${local.db_username}:${random_password.password.result}@127.0.0.1:${local.sql_port}
    deployment:
      extraContainers: |
        - name: cloud-sql-proxy
          image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.6.1
          imagePullPolicy: Always
          args:
          - "--structured-logs"
          - "--health-check"
          - "--http-address=0.0.0.0"
          - "--port=${local.sql_port}"
          - "--private-ip"
          - ${var.project_id}:${var.default_region}:${module.sql_db.name}
          securityContext:
            runAsNonRoot: true
            readOnlyRootFilesystem: true
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - ALL
          livenessProbe:
            httpGet:
              path: /liveness
              port: 9090
            initialDelaySeconds: 0
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 2
          readinessProbe:
            httpGet:
              path: /readiness
              port: 9090
            initialDelaySeconds: 0
            periodSeconds: 10
            timeoutSeconds: 5
            successThreshold: 1
            failureThreshold: 2
          startupProbe:
            httpGet:
              path: /startup
              port: 9090
            periodSeconds: 1
            timeoutSeconds: 5
            failureThreshold: 20
          resources:
            requests:
              memory: 128Mi
              cpu: 50m
            limits:
              memory: 512Mi
              cpu: 250m
    EOT
  ]
}

Version

v0.11.1

On which operating system are you observing this issue?

None

In which environment are you deploying?

Kubernetes with Helm

Additional Context

  • CloudSQL PostgreSQL database
  • GCP
sp71 commented

Closing due to inactivity