terraform-spotinst-ocean-spark

A Terraform module to install the Ocean for Apache Spark data platform.

Introduction

This module imports an existing Ocean cluster into Ocean Spark.

Pre-Reqs

  • Existing EKS/GKE/AKS Cluster
  • EKS/GKE/AKS cluster integrated with Spot Ocean

Usage

provider "spotinst" {
  token   = var.spotinst_token
  account = var.spotinst_account
}

module "ocean-spark" {
  source = "spotinst/ocean-spark/spotinst"

  ocean_cluster_id = var.ocean_cluster_id
}

Upgrade guides

Examples

It can be combined with other Terraform modules to support a number of installation methods for Ocean Spark:

  1. Create an Ocean Spark cluster from scratch in your AWS account
  2. Create an Ocean Spark Cluster from scratch in your AWS account with AWS Private Link support.
  3. Create an Ocean Spark cluster from scratch in your GCP account
  4. Create an Ocean Spark cluster from scratch in your Azure account
  5. Import an existing EKS cluster into Ocean Spark
  6. Import an existing GKE cluster into Ocean Spark
  7. Import an existing AKS cluster into Ocean Spark
  8. Import an existing Ocean cluster into Ocean Spark

1. Create an Ocean Spark cluster in AWS from scratch

  1. Use the AWS vpc Terraform Module to create a VPC network.
  2. use the AWS eks Terraform Module to create an EKS cluster.
  3. Use the SPOTINST ocean-aws-k8s Terraform module to import the EKS cluster into Ocean
  4. Use the SPOTINST ocean-controller Terraform module to install the controller deployment into kubernetes
  5. Use the [SPOTINST ocean-spark Terraform module](this module) to import the cluster into Ocean Spark.

Folder examples/from-scratch/ contains a full example.

2. Create an Ocean Spark Cluster from scratch with AWS Private Link support.

  1. Use the AWS vpc Terraform Module to create a VPC network.
  2. Use the AWS eks Terraform module to create an EKS cluster.
  3. Use the SPOTINST ocean-aws-k8s Terraform module to import the EKS cluster into Ocean
  4. Use the SPOTINST ocean-controller Terraform module to install the controller deployment into kubernetes
  5. Create the Private link required resources (NLB, VPC endpoint service and LB TargetGroup). AWS Docs About PrivateLink.
  6. Use the Terraform AWS EKS LB Controller Module to install the aws load balancer controller in the EKS cluster.
  7. Use the [SPOTINST ocean-spark Terraform module](this module) to import the cluster into Ocean Spark and set the ingress private link input

Folder examples/from-scratch-with-private-link/ contains a full example.

3. Create an Ocean Spark cluster in GCP from scratch

  1. use the GCP google_container_cluster Terraform resource to create an GKE cluster.
  2. Use the SPOTINST spotinst_ocean_gke_import Terraform resource to import the GKE cluster into Ocean
  3. Use the SPOTINST ocean-controller Terraform module to install the controller deployment into kubernetes
  4. Use the [SPOTINST ocean-spark Terraform module](this module) to import the cluster into Ocean Spark.

Folder examples/gcp-from-scratch/ contains a full example.

4. Create an Ocean Spark cluster in AKS from scratch

  1. Use the Azure azurerm_virtual_network Terraform resource and Azure azurerm_subnet Terraform resource to create a VPC network
  2. Use the Azure aks Terraform Module to create an Azure cluster.
  3. Use the SPOTINST ocean-aks-np-k8s Terraform module to import the AKS cluster into Ocean
  4. Use the SPOTINST ocean-controller Terraform module to install the controller deployment into kubernetes
  5. Use the [SPOTINST ocean-spark Terraform module](this module) to import the cluster into Ocean Spark.

Folder examples/azure-from-scratch/ contains a full example.

5. Import an existing EKS cluster

  1. Use the SPOTINST ocean-aws-k8s Terraform module to import the EKS cluster into Ocean
  2. Use the SPOTINST ocean-controller Terraform module to install the controller deployment into kubernetes
  3. Use the [SPOTINST ocean-spark Terraform module](this module) to import the cluster into Ocean Spark.

Folder examples/import-eks-cluster/ contains a full example.

6. Import an existing GKE cluster

  1. Use the SPOTINST spotinst_ocean_gke_import Terraform resource to import the GKE cluster into Ocean
  2. Use the SPOTINST ocean-controller Terraform module to install the controller deployment into kubernetes
  3. Use the [SPOTINST ocean-spark Terraform module](this module) to import the cluster into Ocean Spark.

Folder examples/gcp-import-gke-cluster/ contains a full example.

7. Import an existing AKS cluster

  1. Use the SPOTINST ocean-aks-np-k8s Terraform module to import the AKS cluster into Ocean
  2. Use the SPOTINST ocean-controller Terraform module to install the controller deployment into kubernetes
  3. Use the [SPOTINST ocean-spark Terraform module](this module) to import the cluster into Ocean Spark.

Folder examples/azure-import-aks-cluster/ contains a full example.

8. Import an existing Ocean cluster

  1. Use the [SPOTINST ocean-spark Terraform module](this module) to import the cluster into Ocean Spark.

Folder examples/import-ocean-cluster/ contains a full example.

⚠️ Before running terraform destroy ⚠️

If your cluster was created with v1 of the module or you set deployer_namespace = spot-system, follow those steps:

1- Switch your kubectl context to the targeted cluster

2- Run the script scripts/ofas-uninstall.sh job to safely clean the ocean spark components

3- Once the script is completed with success, you can now run terraform destroy

Terraform module documentation

Requirements

Name Version
terraform >= 0.13.1
kubernetes ~> 2.0
spotinst >= 1.115.0, < 2.0.0
validation 1.0.0

Providers

Name Version
null n/a
spotinst >= 1.115.0, < 2.0.0
validation 1.0.0

Modules

No modules.

Resources

Name Type
null_resource.apply_kubernetes_manifest resource
spotinst_ocean_spark.cluster resource
spotinst_ocean_spark_virtual_node_group.this resource
validation_warning.log_collection_collect_driver_logs data source

Inputs

Name Description Type Default Required
attach_dedicated_virtual_node_groups List of virtual node group IDs to attach to the cluster list(string) [] no
cluster_config Configuration for Ocean Kubernetes cluster
object({
cluster_name = string
certificate_authority_data = string
server_endpoint = string
token = optional(string)
client_certificate = optional(string)
client_key = optional(string)
})
n/a yes
compute_create_vngs Controls whether dedicated Ocean Spark VNGs will be created by the cluster creation process bool true no
compute_use_taints Controls whether the Ocean Spark cluster will use taints to schedule workloads bool true no
create_cluster Controls whether the Ocean for Apache Spark cluster should be created (it affects all resources) bool true no
deployer_namespace The namespace Ocean Spark deployer jobs will run in (must be either 'spot-system' or 'kube-system'). The deployer jobs are used to manage Ocean Spark cluster components. string "kube-system" no
enable_custom_endpoint Controls whether the Ocean for Apache Spark control plane address the cluster using a custom endpoint. bool false no
enable_private_link Controls whether the Ocean for Apache Spark control plane address the cluster via an AWS Private Link bool false no
ingress_custom_endpoint_address The address the Ocean for Apache Spark control plane will use when addressing the cluster when custom endpoint is enabled string null no
ingress_load_balancer_service_annotations Annotations that will be added to the load balancer service, allowing for customization of the load balancer map(string) {} no
ingress_load_balancer_target_group_arn The ARN of a target group that the Ocean for Apache Spark ingress controller will be bound to. string null no
ingress_managed_controller Controls whether an ingress controller managed by Ocean for Apache Spark will be installed on the cluster bool true no
ingress_managed_load_balancer Controls whether a load balancer managed by Ocean for Apache Spark will be provisioned for the cluster bool true no
ingress_private_link_endpoint_service_address The name of the VPC Endpoint Service the Ocean for Apache Spark control plane should bind to when privatelink is enabled string null no
log_collection_collect_app_logs Controls whether the Ocean Spark cluster will collect Spark driver/executor logs bool true no
log_collection_collect_driver_logs Controls whether the Ocean Spark cluster will collect Spark driver logs (Deprecated: use log_collection_collect_app_logs instead) bool null no
ocean_cluster_id Specifies the Ocean cluster identifier string n/a yes
spark_additional_app_namespaces List of Kubernetes namespaces that should be configured to run Spark applications, in addition to the default 'spark-apps' namespace list(string) [] no
webhook_host_network_ports Assign a list of ports on the host networks for our system pods list(number) [] no
webhook_use_host_network Controls whether Ocean Spark system pods that expose webhooks will use the host network bool false no

Outputs

Name Description
ocean_spark_id The Ocean Spark cluster Id