Terraform modules defining the Pegasus "homeprod" infrastructure.
- NAT port forwarding rules sends HTTP/HTTPS traffic to an nginx reverse proxy,
prod-proxy-passthrough
prod-proxy-passthrough
passes it through to the upstream server(s)- SNI is used to inspect the host name for a request without terminating TLS
- Requests for
*.lab.pegasuspad.com
are routed to an nginx reverse proxy for lab services,lab-proxy
- All other requests are routed to an nginx reverse proxy for production services,
prod-proxy
This structure is depicted visually below:
New virtual machines are created using a two step process:
- Terraform is used to provision (i.e. create) the virtual machine and any dependencies.
- Ansible is then used to install and configure software on the machine.
- Ongoing configuration management is then performed by Ansible and/or Harbormaster.
The following diagram depicts the overall process, and the following sections describe the steps in more detail.
Terraform interfaces with the Proxmox server to create virtual machines and related resources (disk ISOs, cloud-init snippets, etc.). Terraform uses cloud-init to perform some minimal configuration of new hosts (creating a user for Ansible, installing required packages such as qemu-agent, etc.). In general, however, this configuration should be kept to a minimum. Specifically, Terraform does not inject any secrets into newly provisioned VMs. This is to keep such secrets out of Terraform state, cloud-init metadata files, etc.
Once a new host is provisioned, Terraform invokes a webhook on the Ansible control node and passes the name of the new host. This triggers the provisoning process, where Ansible runs a playbook that installs the correct software on the VM and injects any secrets or other configuration. Some of the secrets stored on an Ansible node must be unlocked before they came be used for provisioning. This unlocking step must be completed manually by a human operator in an interactive terminal. As a result, there are several states the control node can be in, as depicted below:
A newly-provisioned control node must first be manually "bootstrapped" by an administrator, using a script installed during provisioning. This will result in a fully-provisioned, but "locked" state. An administrator must execute a script to unlock the node before it can accept webhooks or process other provisioning tasks. If the machine is rebooted for any reason, it will revert to a locked state and must manually be unlocked again.
TBD - ansible playbook and gitops with Harbormaster
data VM
: a data VM is a Proxmox virtual machine with no OS installed which is never intended to be booted. Instead, it is used to provisions disks that are attached to another (bootable) VM. The purpose of this is to allowable the bootable VM to be destroyed and recreated without losing the persistent disks. This pattern is illustrated in the documentation for our Terraform provider: https://registry.terraform.io/providers/bpg/proxmox/latest/docs/resources/virtual_environment_vm#example-attached-disks
- Module names ending in
-vm
create a Proxmox VM. The VM name is the module name, excluding this suffix.
Data module that has outputs for configuration options shared by all resources in the "prod" environment, such as Proxmox connection details, datastore types, user ssh keys, etc.
Creates an Ansible control node that can be used to manage prod resources, as well as test Ansible provisioning changes before they are promoted to another environment.
Creates an nginx reverse proxy that serves as the main web entrypoint to our externally-accessible services. This proxy passes through TLS connections, and uses SNI to route requests to the appropriate environment-specific proxy (either lab-proxy
or prod-proxy
).