Incorporate HA local storage ready CSI providers

Question

Incorporate HA local storage ready CSI providers

Closed this issue 4 years ago · 6 comments

This Equinix Metal + Google Anthos integration would benefit from a storage provider that could take advantage of the local disks attached to the provisioned nodes and the fast networking between devices.

From https://github.com/c0dyhi11/terraform-metal-anthos-on-baremetal/releases/tag/v0.2.0:

The only thing you'll need is to bring your own CSI, see our Anthos Ready Storage Partners Here

What does this look like?

When installing this Terraform module, the user would toggle an option that would enable one of a number of CSI integration options.

The Terraform resource enabling this could be a late install that is applied once the Kubeconfig output variable is available.
This resource could perhaps, depend on the Terraform Helm and Terraform Kubernetes providers, or it could be executed with shell commands through a null_resource provisioner.

The CSI provider could use the full disk that is made available within the device.

If the CSI provider requires raw partitions or disks, the Equinix Metal CPR (Custom Partitioning and Raid) features could be introduced during the device creation:

https://registry.terraform.io/providers/equinix/metal/latest/docs/resources/device (search "storage")
https://metal.equinix.com/developers/docs/servers/custom-partitioning-raid/

The introduced modules (if any), variables, and outputs should include one-line descriptions.

The integration should be described within the project README.md.

As an alternative to baking all of these opinions into the existing project, a new Terraform module (hosted in a different repo) could consume this project as a module.

That parent module could use this project’s kubeconfig output variable. The parent module may need to send disk configuration parameters into this project, and if so we may need to introduce those parameters into this module.

This parent module could then express other opinions, like adding Cloudflare or other providers, to point DNS records at the IPs include in this module's output variables.

Answer 1 · 2020-12-15T19:50:25.000Z

Hello!

I wanted to setup a k8s cluster environment on bare metal hosts. I want 4 worker and 2 master nodes. I would also like to have an unmounted 100GB partition on each of the hosts.

Thanks,
Bikash

Answer 2 · 2020-12-15T20:18:17.000Z

@bikashrc25:
You can only have either 1x Master node or 3x Master nodes (We call these Control Plane nodes now, and this codebase defaults to 3x) 2x Control Plane nodes isn't advisable due to possible cluster quorum issues.

You will need a GCP account and an Equinix Metal account.
The c3.small.x86 on these nodes should give you a complete other disk to use for whatever you need.

Answer 3 · 2020-12-15T21:15:43.000Z

Hi!

I am OK with 3 masters or control plane nodes in the k8s cluster. I do have a Pure Storage Corporate GCP account. I am using Jack Hogan's Equinix metal account (credentials to login). I would prefer to have my own Equinix metal account. Are we creating this k8s cluster on bare metal? The intent of this validation is to use bare metal instead of VMs. I also wanted 4 worker nodes. As mentioned earlier, I would need an unmounted 100GB partition on each of the worker and control plane nodes.

Thanks,
Bikash

Answer 4 · 2020-12-16T04:39:04.000Z

@bikashrc25 The nodes in this project are running K8s (Anthos) on bare metal, there are no VMs.

Depending on the number of disks included with the selected plan (https://metal.equinix.com/developers/docs/servers/) there should be more disks available. The plan is a Terraform input: https://github.com/c0dyhi11/terraform-metal-anthos-on-baremetal/blob/master/variables.tf#L30-L40.

For most plans, but not all, it is safe to assume that a /dev/sdb exists.

The c3.small, default for this project, includes an /dev/sdb (480G) which is not partitioned.

The /dev/sda device is fully consumed (480G). This project's default OS is Ubuntu 20.04 and the disk is allocated as follows: 2M is dedicated to UEFI boot (sda1), 2G to the OS (sda2), and the remaining space is formated as ext4 (sda2). There may be larger storage devices attached to the machine, based on the plan, beyond /dev/sdb, which will not be partitioned.

For example, the s3.xlarge device has the following disks:

2 x 960GB SSD boot disks
2 x 240 GB NVME
12 x 8 TB HDD

It may be best to leave the disk selection up to the user. If they choose to use an s3.xlarge plan, they could configure a second input variable, for example:

csi_provider = "portworx"
csi_devices = ["/dev/sdb", "/dev/sdc", "/dev/sdd", "/dev/nvme0", "/dev/nvme1"]

It is safe to assume, for now, that all worker nodes will have the same plan and will have the same allotment of attached storage.

A single disk per device should be sufficient for most documentation examples, csi_devices = ["/dev/sdb"], but we should definitely explorer larger configurations accommodating real-world use-cases.

https://metal.equinix.com/developers/docs/guides/rook-ceph/ might provide additional insights and details.

The csi_ variables above are simple examples. It may be better to provide a CSI map variable that defines the "provider" in a common way, and perhaps the disk list, while providing an extensible map for configuring other aspects of the provider, like helm chart options if that applies. Example:

csi = {
  provider = "portworx"
  fast_devices = ["/dev/nvme0", "/dev/nvme1"] # pool A
  slow_devices = ["/dev/sdb", "/dev/sdc", "/dev/sdd"] # pool B
  portworx_option = ""
  portworx_other_options =  {
    whatever = "fits"
  }
}

Answer 5 · 2021-01-08T21:30:15.000Z

A challenge to defining the names of the device upfront is that they are non-deterministically assigned. For example, when provisioning 3 c3.medium devices, the root device may be either "sda", "sdb", or "sdc". It then becomes necessary to use a script within the device to examine the available disks and allocate the preferred disks.

@enkelprifti98 has solved for this problem in an unrelated project: https://github.com/enkelprifti98/terraform-packet-distributed-minio/blob/f14f635587a7b023d8384e1c03201c344a98db0d/assets/user_data.sh#L62

Answer 6 · 2021-02-24T18:49:57.000Z

Closing this as included in v0.5.0.

Additional modules or challenges can be discussed in new issues.