GoogleCloudPlatform/airflow-operator

Add support for SQL Proxy connections in our workers

jcunhasilva opened this issue · 6 comments

Currently if we want to connect to a Cloud SQL database in our workers/dags, we need to create a connection using the database's Public or Private IP.
The best way to connect to these databases is through SQL Proxy side containers as described here:
https://cloud.google.com/sql/docs/postgres/connect-kubernetes-engine

We should have a way of specifying predefined sql proxy connections to be attached to the scheduler or worker pods. We could define these connections directly in the cluster configuration.

The configuration spec should be similar as the one currently available in the base yaml configuration:

spec:
  sqlproxy:
    project: kubeflow-193622
    region: us-central1
    instance: testsql-cluster

Is the cloudsql different from the DB specified in airflowBase ?
Like if you use airflowBase with cloudsql (which is not functional at this point), all workers (celery based) connect to this instance to report back task status.
If it is the same cloudsql instance, you could use the same sqlproxy that the worker uses.
SqlProxy is deployed as part of the base cluster.

Or are you asking for a general pattern of injecting side-cars into workers.

Yes that was my first approach (use the same sql proxy as the one being used in AirflowBase) but then I stumbled upon the Cloud SQL issue and couldn't test it further.

However, the base configuration specifies the instance that will hold the Airflow database, which in our case it's not the same instance where we want to fetch data inside our DAGs. In this case we need a different side-car for our workers.

Iam thinking if it is the right way to do it. Having a side-car would create the sqlproxy per worker.

What about creating a separate deployment just for sqlproxy to your custom cloudsql. And use that as a parameter to the Tasks.

FYI i have added a fix for cloudsql with #56

Please note when setting up cloudsql ensure the default username is postgres and password is added to the secret in the samples folder.

@barney-s since CloudSQL is now working with Postgres, I was able to create a new Airflow database inside our existing instance and use the same SQL Proxy to connect to several databases. I guess this ticket can be closed.

Thank you for your help!

Thanks for confirming. The bigger pattern is injecting sidecars in workers which is still open will capture in a separate doc.