datopian/ckan-cloud-operator

Using cloud-native object storage for file storage

Closed this issue · 18 comments

Job story

When I standup a cluster of CKANs managed by CCO, I want to provide a configuration for all CKANs hosted therein to use object storage that is provided by the cloud platform, rather than the current method of using minio, so I can leverage the redundancy and availability promises of those services, and reduce the complexity of the software running within the cluster I manage.

Context

We currently use minio running inside the cluster for file storage that backs data in our CKAN instances. We want to use the object storage solutions provided by Cloud providers (S3, Blob Storage, Google Cloud Storage), and this is a requirements for several customers. In such cases, we'd like to also ensure that minio is not deployed to a cluster (we don't want to deploy any dependencies that are not actively in use)

Acceptance criteria

  • Provide an option to not deploy minio to a cluster
  • Configure CKANs in the cluster (using cloudstorage extension or similar) to use the cloud-native object storage solution
  • Documentation for this feature

I think that current helm-based ckan instances don't use minio at all (this is something of the deis instances only).
All configuration for these helm instances is in ckan-cloud-helm, and cco currently provisions disks for storage for these instances.
Do we need to support multiple buckets + credentials for different instances, or can we use a single bucket with separate folders (but shared credentials)?

Do we need to support multiple buckets + credentials for different instances, or can we use a single bucket with separate folders (but shared credentials)?

  • A bucket per instance
  • Not shared credentials

Current flow:

  • deis-type instances: CCO creates directory named as instance in common minio storage
  • helm-based instances: helm chart creates persistent volume for data.

Plan:

  1. Add configurable option on cluster initialization setup: use minio or cloud native object storage.
  2. Implement gcloud/s3/azure bucket storage creation (using service account credentials) on CKAN instance creation.
  3. Add tests

CKAN instance creation flow

An example of command to create a CkanInstance type resource:
ckan-cloud-operator ckan instance create helm <path-to-values.yaml> --instance-id <instance id> --instance-name <instance name> --update --exists-ok --wait-ready

The command will run ckan_cloud_operator.providers.ckan.instance.manager.create() function with the next parameters:

instance_type='helm'
instance_id=<instance id>
instance_name=<instance name>
values=None
values_filename=<path-to-values.yaml>
exists_ok=True
dry_run=False
update_=True
wait_ready=True
skip_deployment=False
skip_route=False
force=False

Note, the <path-to-values.yaml> could be - which means the command will try to get values from STDIN.

If --exists-ok parameter was not specified, the command will try to find an existing CkanInstance resource with the same instance_id and will raise an exception if it exists.

Next, the command runs kubectl apply command passing generated resource spec based on input data to create a new CkanInstance entry on the cluster.

Next, when --update parameter was specified, the command will run ckan_cloud_operator.providers.ckan.instance.update() function with the next parameters:

instance_id=<instance id>
wait_ready=True
skip_deployment=False
skip_route=False
force=False,
dry_run=False

This will run pre update hook from ckan_cloud_operator.providers.ckan.deployment.helm.manager. pre_update_hook(). The hook does several things:

  1. creates a namespace for the instance and RBAC role for the service account to access the namespace.
  2. updates CkanInstance spec to include route and siteUrl information
  3. updates CkanInstance spec to include CKAN admin login credentials (ckan-admin-email , ckan-admin-password). Password stored in ckan-admin-password secret in the instance namespace.

Next, if --skip-deployment parameter was not specified, the command will run ckan_cloud_operator.providers.ckan.deployment.helm.manager.update() function.
The function runs helm deploy command for CKAN helm chart from ckan-cloud-helm repo, passing all the values. Repo URL could be overridden by setting custom ckanHelmChartRepo value in CkanInstance spec.

After that the CKAN instance will be up and running in namespace with the name equals to <instance id>.

Next, if --skip-route parameter was not specified, the command will run ckan_cloud_operator.routers.manager.create_subdomain_route() function. It appends domain/subdomain data for the instance to the Traefik config, and then redeploys Traefik instances-default instance with the new config.

Next, the command will run ckan_cloud_operator.providers.ckan.deployment.helm.manager. create_ckan_admin_user() function, that will exec a ckan-paster --plugin=ckan sysadmin -c /etc/ckan/production.ini add ... command inside the CKAN pod providing admin credentials stored in CkanInstance spec.

Where to make modification

Bucket should be created on the step before helm chart deployment (i.e. before ckan_cloud_operator.providers.ckan.deployment.helm.manager.update())

Supported configuration options

  1. Override cluster default zone/region
  2. ? (to be discussed)

Implementation

For the each of supported cloud provider there will be a submodule in ckan_cloud_operator.providers.cluster.storage module.

A submodule contains manager with functions create_bucket(), delete_bucket(), get_bucket(), list_buckets(). It runs specific gcloud, aws or az command that manages buckets of a selected cloud provider.

Create/update/delete functions get <instance id> as a first parameter.

Create/update functions update CkanInstance spec with the bucket credentials.

Deletion function should remove bucket itself and its credentials from CkanInstance spec on the cluster.

Listing function loops through CkanInstance entries and returns list of buckets with instance ids and credentials

Bucket credentials

When a bucket is created, save its credentials in a secret named bucket-credentials inside namespace of the CKAN instance.
We should update CKAN helm chart to get credentials from this secret and set them as environment variables during helm chart deployment.

I don't think we need multiple levels.

What's the user interface going to look like?

  1. In ckan-cloud-operator cluster initialize --interactive flow there will be a step where you choose to use self-hosted minio or cloud-native buckets for file storage, and the choice will be saved in cluster config.
  2. I'll also add optional --bucket-region parameter to the ckan-cloud-operator ckan instance create helm CLI command, which will default to the region of the cluster.

How is it going to interact with the existing storage module

No interaction with minio required.

How do we generalize the API so both providers are supported (and is it required)

Currently minio only used in Deis type CKAN instances. If we need to support it in helm-based instances too, I'll just add check for the saved value (use minio or cloud-native) in cluster config before creating a storage bucket and then will run bucket creation or minio subfolder creation function.

How does this provider fit in the general CCO model of editing the state and then applying it. What k8s objects are needed to be created? How does the update phase would work?

Bucket info will be contained in CkanCloudCkanInstance spec (bucket.<s3|gcloud|abs>).
No additional k8s objects needed.

CCO will deploy CKAN helm package with bucket storage info/credentials as environment vars on create/update phase based on CkanCloudCkanInstance spec.

Sounds good @aluminiumgeek

Total: 8 days - consumed 6 days, remaining 2 days for AWS work

Done wrapping storage modules with CKAN instance creation flow

Adam reviewed PR and now Mikhail needs to update based on his comments.

@estebanruseler to determine status with @aluminiumgeek

Travis CI build is green and all updates regarding PR review are finished.

@akariv @zelima is this in a state for me to do acceptance testing yet (in general, not for data.gov specifally)?

Not yet, there still is active PR that needs to be merged #83

cc @aluminiumgeek