fluxcd/terraform-provider-flux

Unable to Deploy SSH Key when Bootstrapping Flux with Terraform

vijayansarathy opened this issue · 14 comments

Hi,
I am trying to install Flux using Terraform and I keep getting a 401 Bad credentials error when Terraform tries to create the github_repository_deploy_key resource to deploy the SSH key to the repo.

I am using the same personal access token that I used when I was using Flux CLI which works fine.

Following are the Git-related environment variables and the command I use to successfully bootstrap Flux using CLI:

export CLUSTER_NAME=sarathy-tf-flux
export GITHUB_TOKEN=ghp_XXX
export GITHUB_USER=johndoe
export GITHUB_HOSTNAME=github.cicd.company.com
export GITHUB_REPOSITORY=fluxv2-deployment-hub

flux bootstrap github  \
  --components-extra=image-reflector-controller,image-automation-controller \
  --owner=$GITHUB_USER \
  --hostname=$GITHUB_HOSTNAME \
  --namespace=flux-system \
  --repository=$GITHUB_REPOSITORY \
  --ssh-key-algorithm=ecdsa \
  --secret-name=flux-bootstrap \
  --branch=main \
  --path=clusters/$CLUSTER_NAME \
  --personal

My provider configurations in Terraform look as follows:

provider "github" {
  owner = var.github_organization
  token = var.github_token
}

provider "flux" {
  kubernetes = {
    config_path = "~/.kube/config"
  }
  git = {
    url = "ssh://git@github.com/${var.github_organization}/${var.github_repository}.git"
    ssh = {
      username    = var.github_user
      private_key = tls_private_key.flux_key.private_key_pem
    }
  }
}

resource "github_repository_deploy_key" "flux_public_key" {
  title      = "Flux-Terraform"
  repository = var.github_repository
  key        = tls_private_key.flux_key.public_key_openssh
  read_only  = "false"
}

with the variables set to the following values, mimicking what I do with CLI.

github_user              = "johndoe"
github_token            = "ghp_XXX"
github_organization = "github.cicd.company.com"
github_repository     = "fluxv2-deployment-hub"

I get the following error when I run Terraform:

github_repository_deploy_key.flux_public_key: Creating...
Error: POST https://api.github.com/repos/github.cicd.company.com/fluxv2-deployment-hub/keys: 401 Bad credentials 

Any insights as to what I am doing wrong with the Terraform configurations?
Thanks,
Viji

For completeness, I have included the other two Terraform resources, namely, tls_private_key, flux_bootstrap_git, required for bootstrapping Flux.

resource "tls_private_key" "flux_key" {
  algorithm   = "ECDSA"
  ecdsa_curve = "P256"
}

resource "flux_bootstrap_git" "fluxboot" {
  depends_on = [github_repository_deploy_key.flux_public_key, module.eks]
  path       = "clusters/${var.cluster_name}"
}

I would propose using the following setup:

resource "flux_bootstrap_git" "this" {
  path    = var.flux_bootstrap_configuration["git_path"]
  version = var.flux_bootstrap_configuration["flux_version"]
  components_extra = [
    "image-reflector-controller",
    "image-automation-controller"
  ]
 ...
}

resource "kubernetes_namespace" "flux_bootstrap" {
  metadata {
    name = "flux-bootstrap"
  }

  lifecycle {
    ignore_changes = [metadata]
  }
}

resource "kubernetes_secret" "flux_bootstrap" {
  metadata {
    name      = "ssh-keypair"
    namespace = kubernetes_namespace.flux_bootstrap.metadata.0.name
  }

  type = "Opaque"

  data = {
    "identity.pub" = data.terraform_remote_state.keypairs.outputs.flux_ssh_key["public"]
    "identity"     = data.terraform_remote_state.keypairs.outputs.flux_ssh_key["private"]
    "known_hosts"  = data.terraform_remote_state.keypairs.outputs.github_known_hosts["ecdsa"]
  }

  depends_on = [kubernetes_namespace.flux_bootstrap]
}

I like to keep all the flux GitRepository and Kustomization resources in a flux-bootstrap namespace away from the flux-system namespace. This allows flux bootstrap to have its own namespace to use.

Additionally, I add the following RBAC:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: <company name>-kustomize-reconciler
  namespace: flux-bootstrap

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: <company name>-kustomize-reconciler
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: <company name>-kustomize-reconciler
    namespace: flux-bootstrap

Thanks @swade1987 for your reply.
I am curious as to how this proposed setup addresses my question regarding Flux failing to add the SSH public key during the bootstrapping phase?

@vijayansarathy the code I use for adding a deploy key is as follows:

resource "tls_private_key" "deploy_key" {
  algorithm = "RSA"
  rsa_bits  = "4096"
}

resource "github_repository_deploy_key" "deploy_key" {
  key        = tls_private_key.deploy_key.public_key_openssh
  read_only  = var.read_only
  repository = var.repo_name
  title      = "${var.platform}-${data.aws_region.current.name}-${var.type}"
}

However, I moved to use the following setup, which uses the same SSH key to access all flux GitHub repositories the cluster requires.

resource "tls_private_key" "ssh_key" {
  algorithm = "RSA"
  rsa_bits  = "4096"
}

resource "github_user_ssh_key" "ssh_key" {
  title = "${var.platform}-${data.aws_region.current.name}-${var.type}"
  key   = tls_private_key.ssh_key.public_key_openssh
}

resource "kubernetes_secret" "flux_bootstrap" {
  metadata {
    name      = "ssh-keypair"
    namespace = kubernetes_namespace.flux_bootstrap.metadata.0.name
  }

  type = "Opaque"

  data = {
    "identity.pub" = base64encode("tls_private_key.ssh_key. public_key_openssh")
    "identity"     =  base64encode("tls_private_key.ssh_key.private_key_pem")
    "known_hosts"  = base64encode("github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=")
  }

  depends_on = [kubernetes_namespace.flux_bootstrap]
}

@swade1987 I was able to get it working at my end. The issue was that I using wrong value for the username field in the Flux provider configuration shown below. After I changed username to git, I was able to resolve the issue.

That being said, even after Flux has been installed successfully on the cluster (all controllers in the GitOps toolkit are in a healthy state), I continue to see these messages: flux_bootstrap_git.fluxboot: Still creating...
Not sure why it does not see that the installation is done and in a ready state.

provider "flux" {
  kubernetes = {
    config_path = "~/.kube/config"
  }
  git = {
    url = "ssh://git@github.com/${var.github_organization}/${var.github_repository}.git"
    ssh = {
      username    = "git"
      private_key = tls_private_key.flux_key.private_key_pem
    }
  }
}

@vijayansarathy that is great to hear you managed to fix it.

Regarding the Still creating ..., what version of the provider are you using?

@swade1987 It is version 1.2.3.

I don't experience that. I am using this example for reference.

Screenshot 2024-03-27 at 20 14 13

For reference from a provider version perspective:

❯ terraform version
Terraform v1.7.0
on darwin_arm64
+ provider registry.terraform.io/fluxcd/flux v1.2.3
+ provider registry.terraform.io/hashicorp/tls v4.0.5
+ provider registry.terraform.io/integrations/github v6.1.0
+ provider registry.terraform.io/tehcyx/kind v0.4.0

Your version of Terraform is out of date! The latest version
is 1.7.5. You can update by downloading from https://www.terraform.io/downloads.html

@vijayansarathy are you bootstrapping the cluster via the flux CLI and then trying to get flux_bootstrap_git to reference the same repository and directory?

@vijayansarathy are you bootstrapping the cluster via the flux CLI and then trying to get flux_bootstrap_git to reference the same repository and directory?

No, I am doing the entire bootstrapping process - creating the SSH keys, installing GitOps tookkit etc. - using Terraform.

For reference from a provider version perspective:

❯ terraform version
Terraform v1.7.0
on darwin_arm64
+ provider registry.terraform.io/fluxcd/flux v1.2.3
+ provider registry.terraform.io/hashicorp/tls v4.0.5
+ provider registry.terraform.io/integrations/github v6.1.0
+ provider registry.terraform.io/tehcyx/kind v0.4.0

Your version of Terraform is out of date! The latest version
is 1.7.5. You can update by downloading from https://www.terraform.io/downloads.html

Here's the version info on my installation.

> terraform version
Terraform v1.7.4
on darwin_arm64
+ provider registry.terraform.io/alekc/kubectl v2.0.4
+ provider registry.terraform.io/fluxcd/flux v1.2.3
+ provider registry.terraform.io/hashicorp/aws v5.40.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.3.3
+ provider registry.terraform.io/hashicorp/helm v2.12.1
+ provider registry.terraform.io/hashicorp/null v3.2.2
+ provider registry.terraform.io/hashicorp/time v0.11.1
+ provider registry.terraform.io/hashicorp/tls v4.0.5
+ provider registry.terraform.io/integrations/github v6.1.0

Your version of Terraform is out of date! The latest version
is 1.7.5. You can update by downloading from https://www.terraform.io/downloads.html

@vijayansarathy please see #617 (comment) and #617 (comment) I am unable to replicate this.

If you are using flux_bootsrap_git, I am unsure why you would need the kubectl and helm providers unless you do something additional.

@vijayansarathy please see #617 (comment) and #617 (comment) I am unable to replicate this.

If you are using flux_bootsrap_git, I am unsure why you would need the kubectl and helm providers unless you do something additional.

Sorry for the confusion caused by those providers. I am not using them. It is there because I was doing something else with those providers.

The cause of this issue (Still creating ..) is that Flux Source Controller times out when trying to close the repo from GitHub. It is able to push the YAML artifacts that pertain to the toolkit components but unable to clone the repo. This is the error message from that controller.

flux-system ssh://git@github.com/vijayansarathy/fluxv2-deployment-primary.git 3m7s False failed to checkout and determine revision: unable to clone 'ssh://git@github.com/vijayansarathy/fluxv2-deployment-primary.git': dial tcp 140.82.112.4:22: connect: connection timed out

@vijayansarathy, that sounds like an issue with Flux rather than the terraform provider. I would open an issue there.