/deepgram-onprem-terraform

Terraform configs for bringing up a Deepgram on-prem instance on GCP.

Primary LanguageHCLMIT LicenseMIT

Deepgram On-Prem Terraform for GCP

This repo has a set of basic Terraform configs for deploying a Deepgram on-prem instance on Google Cloud. This is not really complete; it includes two main things:

  1. A Packer config to build a VM image for running Deepgram on-prem.
  2. A Terraform config to stand up a GCP Managed Instance Group of VMs running the VM image.

There are other steps you will need to follow in order to fully deploy Deepgram on GCP. (All of these things can be done with Terraform, but are not yet included in this repo.) You will also need to:

  • Configure health checks for your Managed Instance Group
  • Configure a GCP load balancer to point to your Managed Instance Group
  • (Optionally) configure an external IP, domain name, and SSL certificate for your load balancer

See below for complete instructions.

Building the VM image

The packer/ directory contains a Packer configuration to generate a custom image for the Deepgram on-prem service.

  1. Edit packer/setup.sh and set the DEEPGRAM_USERNAME and DEEPGRAM_PASSWORD environment variables at the top of that file to the values provided by your Deepgram account contact. These are required to authenticate to Deepgram's private Docker registry.
  2. Edit packer/setup.sh and update the curl commands in that file to point to the URLs of the encrypted models provided by your Deepgram account manager.
  3. Edit packer/build.pkr.hcl and set the project_id and zone variables to the value of your GCP project ID and desired zone (e.g. us-west1-a). Also, edit the accelerator_type variable to set the GCP project ID and type of GPU you wish to deploy on.
  4. Run:
$ packer build build.pkr.hcl

This will create the VM image in your GCP project. It will also emit a file called manifest.json that contains the image name, like the following:

{
  "builds": [
    {
      "name": "dgonprem-packer-image",
      "builder_type": "googlecompute",
      "build_time": 1707254755,
      "files": null,
      "artifact_id": "deepgram-onprem-1707253541",
      "packer_run_uuid": "b5a76386-82fb-228b-239c-7e0206c2b167",
      "custom_data": null
    }
  ],
  "last_run_uuid": "b5a76386-82fb-228b-239c-7e0206c2b167"
}

The artifact_id is the name of the image that was created, which can then be plugged into the Terraform configs (see below) to deploy the Deepgram on-prem service.

Deploying the Managed Instance Group

The terraform/ directory contains a Terraform configuration to deploy a Managed Instance Group of VMs running the Deepgram on-prem image.

Create a Health Check in GCP

The Terraform configs here don't include a health check, so you need to create this manually in the GCP console at https://console.cloud.google.com/compute/healthChecks. You should configure the health check as follows:

  • Health check name: dgonprem-mig-health-check-8080
  • Path: /v1/status
  • Protocol: HTTP
  • Port: 8080
  • Proxy protocol: NONE
  • Logs: Disabled
  • Interval: 60 seconds
  • Timeout 10 seconds
  • Healthy threshold: 3 consecutive successes
  • Unhealthy threshold: 10 consecutive failures

(Yes, this could in principle be included in the Terraform config itself.)

Editing the Terraform config

You'll need to edit a few things in the deepgram-onprem/main.tf file before deployment:

  1. Edit deepgram-onprem/main.tf and set the project and zone variables at the top to your GCP project and zone, respectively.
  2. Edit the packer_image variable to the artifact_id generated by Packer.
  3. Edit the various instances of YOUR-GCP-PROJECT-ID and change them to your GCP project.
  4. Edit YOUR-GCP-SERVICE-ACCOUNT and change this to the ID of the service account associated with your GCP project.
  5. Edit the min_replicas and max_replicas variables to set the desired number of VMs in the managed instance group. For fast startup times I recommend setting min_replicas to at least 1.

Running Terraform

You should now be able to run terraform init and terraform apply to deploy the Managed Instance Group.

Configuring a Load Balancer

You will also need to configure a GCP load balancer to point to your Managed Instance Group. Use the Managed Instance Group as the "backend" service for the load balancer. The endpoint protocol needs to be configured to HTTP (not HTTPS), and should use the named port http.

Monitoring

The Deepgram service is configured to send logs to the GCP Ops Agent running on each VM instance, which will forward logs to Google's Cloud Logging service.

Debugging

If things aren't working, you can login to one of the VM instances with your service and use docker ps -a to see the list of containers. There should be an "API" container and an "engine" container running. You can use docker logs to inspect the container logs (which should also appear in the Google Cloud console logs explorer as well).