GoogleCloudPlatform/cloud-foundation-fabric

Not able to create GKE cluster based dataproc

Closed this issue · 9 comments

Describe the bug
Not able to define values of the following module using while creating gke based dataproc cluster

kubernetes_software_config = {
          component_version = {
            "SPARK" : "3.1-dataproc-7"
          }
          properties = {
            "spark:spark.kubernetes.container.image" : "europe-west3-docker.pkg.dev/cloud-dataproc/dpgke/sparkengine:dataproc-14"
          }
        }

Environment

output from `terraform -version`

v29 version of dataproc module

output from `git rev-parse --short HEAD`

To Reproduce
using kubernetes_cluster_config to create dataproc cluster for gke type

Expected behavior
it must create gke cluster

Result

The given value is not suitable for module.cluster_config_gce.var.dataproc_config declared at ../modules/dataproc/variables.tf:1,1-27: attribute
│ "virtual_cluster_config": attribute "kubernetes_cluster_config": attribute "kubernetes_software_config": attribute "component_version": list of
│ map of string required.

Additional context
Add any other context about the problem here

There was a bug in v29. Please try with HEAD

Ok thanks for response, I am not getting "HEAD", do we need to use another older version such as 28 or 27?

Yes, just use the latest version from the master branch

Yes I am using the latest version 29 from the main branch
source = "git::[https://github.com/GoogleCloudPlatform/cloud-foundation-fabric.git//modules/dataproc?ref=v29.0.0"
Actually in variable.tf .
I am trying to create Dataproc cluster using GKE
virtual_cluster_config = {
staging_bucket = local.gke_staging_bucket
kubernetes_cluster_config = {
kubernetes_namespace = "foobar"
kubernetes_software_config = {
component_version = [{ "SPARK" = "3.1-dataproc-7" }]
properties = [{ "spark:spark.kubernetes.container.image" = "europe-west3-docker.pkg.dev/cloud-dataproc/dpgke/sparkengine:dataproc-14" }]
}

but it throw error
Error: Incorrect attribute value type

│ on .terraform/modules/cluster_config_gce.dataproc/modules/dataproc/main.tf line 246, in resource "google_dataproc_cluster" "cluster":
│ 246: component_version = var.dataproc_config.virtual_cluster_config.kubernetes_cluster_config.kubernetes_software_config.component_version
│ ├────────────────
│ │ var.dataproc_config.virtual_cluster_config.kubernetes_cluster_config.kubernetes_software_config.component_version is list of map of string with 1 element

│ Inappropriate value for attribute "component_version": map of string required.

Can you please help us which version is good to a based dataproc cluster and if any example (as no example given to below link for creating GKE based dataproce cluster https://github.com/GoogleCloudPlatform/cloud-foundation-fabric/blob/v29.0.0/modules/dataproc/README.md

Try removing the ?ref=v29.0.0 from the module's source

After removing the ?ref=v29.0.0 from the module's source it works, but I am creating my own modules and need to use ref=v29.0.0 is must, please let us know which version ()lower 28, 27, 26 ect) is working to create GKE based dataproc cluster

use one of the daily tags, perhaps daily-2024.03.04

Hi juliocc,

As per your recomandation I am using daily-2024.03.04 to create data proc cluster using GKE using example given
module "cluster_config_gke" {
source = "../modules/dataprocgke"
name = "dev-gke-cluster-test"
project_id = local.project_id
region = local.region
service_account = local.service_account
labels = local.labels
dataproc_config = {
virtual_cluster_config = {
staging_bucket = local.gke_staging_bucket
kubernetes_cluster_config = {
kubernetes_namespace = "foobar"
kubernetes_software_config = {
component_version = {
"SPARK" = "3.1-dataproc-7"
}
properties = {
"spark:spark.kubernetes.container.image" = "europe-west3-docker.pkg.dev/cloud-dataproc/dpgke/sparkengine:dataproc-14"
}
}
gke_cluster_config = {
gke_cluster_target = "projects/sampleproject-1/locations/europe-west3/clusters/simple-std-cluster-1"
node_pool_target = {
node_pool = "test-node-pool-1"
roles = ["DEFAULT"]
}
}
}
}
}
}
but it throw error with Pool name '' (empty)
Error: Error waiting for creating Dataproc cluster: Error code 3, message: GKE Node Pool name '', must conform to pattern 'projects/([^/]+)/(?:locations|zones)/([^/]+)/clusters/([^/]+)/nodePools/([^/]+)'

│ with module.cluster_config_gke.module.dataprocgke.google_dataproc_cluster.cluster,
│ on .terraform/modules/cluster_config_gke.dataprocgke/modules/dataproc/main.tf line 23, in resource "google_dataproc_cluster" "cluster":
│ 23: resource "google_dataproc_cluster" "cluster" {

and if I replace code as per recomandtion in error as
node_pool = "projects/sampleproject-1/locations/europe-west3/clusters/simple-std-cluster-1/nodePools/test-node-pool-1

Now it throw error with project name as twice..

Error: Error creating Dataproc cluster: googleapi: Error 400: GKE Node Pool name 'projects/sampleproject-1/locations/europe-west3/clusters/simple-std-cluster-1/nodePools/projects/sampleproject-1/locations/europe-west3/clusters/simple-std-cluster-1/nodePools/test-node-pool-1', must conform to pattern 'projects/([^/]+)/(?:locations|zones)/([^/]+)/clusters/([^/]+)/nodePools/([^/]+)', badRequest

│ with module.cluster_config_gke.module.dataprocgke.google_dataproc_cluster.cluster,
│ on .terraform/modules/cluster_config_gke.dataprocgke/modules/dataproc/main.tf line 23, in resource "google_dataproc_cluster" "cluster":
│ 23: resource "google_dataproc_cluster" "cluster" {

Please let us know what values need to add in node_pool .
Also can you share which is latest stable version to create dataproc cluster using GKE, or when in future we can get stable version to create dataproc cluster using GKE

There are few issues that may play a role here:

  • you need to have a node pool within cluster, where kube-system workloads can run
  • there is an issue in provider, that it doesn't remove the node pool from the cluster and it doesn't identify existing node pool as not requiring any change, so you need to manually remove the old pool before retrying

I used also custom IAM configuration, as otherwise everything binds to compute default service account.

I managed to create successfully a Dataproc on GKE using following config:

locals {
  dataproc_namespace = "foobar"
}

module "cluster-1" {
  source     = "./fabric/modules/gke-cluster-standard"
  project_id = var.project_id
  name       = "cluster"
  location   = "${var.region}-b"
  vpc_config = {
    network               = var.vpc.self_link
    subnetwork            = var.subnet.self_link
    secondary_range_names = {} # use default names "pods" and "services"
    master_authorized_ranges = {
      internal-vms = "10.0.0.0/8"
    }
    master_ipv4_cidr_block = "192.168.0.0/28"
  }
  private_cluster_config = {
    enable_private_endpoint = true
    master_global_access    = false
  }
  enable_features = {
    dataplane_v2        = true
    fqdn_network_policy = true
    workload_identity   = true
  }
  labels = {
    environment = "dev"
  }
}

module "cluster-1-nodepool-1" {
  source       = "./fabric/modules/gke-nodepool"
  project_id   = var.project_id
  cluster_name = module.cluster-1.name
  location     = "${var.region}-b"
  name         = "nodepool-1"
  nodepool_config = {
    autoscaling = {
      max_node_count = 2
      min_node_count = 1
    }
  }
}

module "service-account" {
  source     = "./fabric/modules/iam-service-account"
  project_id = var.project_id
  name       = "dataproc-worker"
  iam = {
    "roles/iam.workloadIdentityUser" = [
      "serviceAccount:${var.project_id}.svc.id.goog[${local.dataproc_namespace}/agent]",
      "serviceAccount:${var.project_id}.svc.id.goog[${local.dataproc_namespace}/spark-driver]",
      "serviceAccount:${var.project_id}.svc.id.goog[${local.dataproc_namespace}/spark-executor]"
    ]
  }
  iam_project_roles = {
    (var.project_id) = ["roles/dataproc.worker"]
  }
}

module "processing-dp-cluster" {
  source     = "./fabric/modules/dataproc"
  project_id = var.project_id
  name       = "my-cluster"
  region     = var.region
  dataproc_config = {
    virtual_cluster_config = {
      kubernetes_cluster_config = {
        kubernetes_namespace = local.dataproc_namespace
        kubernetes_software_config = {
          component_version = {
            "SPARK" : "3.1-dataproc-14"
          }
          properties = {
            "dataproc:dataproc.gke.agent.google-service-account"          = module.service-account.email
            "dataproc:dataproc.gke.spark.driver.google-service-account"   = module.service-account.email
            "dataproc:dataproc.gke.spark.executor.google-service-account" = module.service-account.email
          }
        }
        gke_cluster_config = {
          gke_cluster_target = module.cluster-1.id
          node_pool_target = {
            node_pool = "dataproc-nodepool"
            roles     = ["DEFAULT"]
          }
        }
      }
    }
  }
}

I'll be updating README for Dataproc module with this example.