/scylla-machine-image

Primary LanguagePythonApache License 2.0Apache-2.0

Scylla Machine Image

Provides following

  • Create an image with pre-installed Scylla
  • Allow to configure the database when an instance is launched first time
  • Easy cluster creation

OS Package

RPM/DEB package that is pre-installed in the image. Responsible for configuring Scylla during first boot of the instance.

Create an image

AWS

aws/ami/build_ami.sh

Scylla AMI user-data Format v2

Scylla AMI user-data should be passed as a json object, as described below

see AWS docs for how to pass user-data into ec2 instances: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-add-user-data.html

EC2 User-Data

User Data that can pass when create EC2 instances

  • Object Properties
    • scylla_yaml (Scylla YAML) – Mapping of all fields that would pass down to scylla.yaml configuration file
    • scylla_startup_args (list) – embedded information about the user that created the issue (NOT YET IMPLEMENTED) (default=’[]’)
    • developer_mode (boolean) – Enables developer mode (default=’false’)
    • post_configuration_script (string) – A script to run once AMI first configuration is finished, can be a string encoded in base64. (default=’’)
    • post_configuration_script_timeout (int) – Time in seconds to limit the post_configuration_script (default=’600’)
    • start_scylla_on_first_boot (boolean) – If true, scylla-server would boot at AMI boot (default=’true’)

Scylla YAML

All fields that would pass down to scylla.yaml configuration file

see https://docs.scylladb.com/operating-scylla/scylla-yaml/ for all the possible configuration availble listed here only the one get defaults scylla AMI

  • Object Properties
    • cluster_name (string) – Name of the cluster (default=generated name that would work for only one node cluster)
    • auto_bootstrap (boolean) – Enable auto bootstrap (default=’true’)
    • listen_address (string) – Defaults to ec2 instance private ip
    • broadcast_rpc_address (string) – Defaults to ec2 instance private ip
    • endpoint_snitch (string) – Defaults to ‘org.apache.cassandra.locator.Ec2Snitch’
    • rpc_address (string) – Defaults to ‘0.0.0.0’
    • seed_provider (mapping) – Defaults to ec2 instance private ip

Example usage of user-data

Spinning a new node connecting to “10.0.219.209” as a seed, and installing cloud-init-cfn package at first boot.

using json

{
     "scylla_yaml": {
         "cluster_name": "test-cluster",
         "experimental": true,
         "seed_provider": [{"class_name": "org.apache.cassandra.locator.SimpleSeedProvider",
                            "parameters": [{"seeds": "10.0.219.209"}]}],
     },
     "post_configuration_script": "#! /bin/bash\nyum install cloud-init-cfn",
     "start_scylla_on_first_boot": true
}

using yaml

scylla_yaml:
  cluster_name: test-cluster
  experimental: true
  seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
        - seeds: 10.0.219.209
post_configuration_script: "#! /bin/bash\nyum install cloud-init-cfn"
start_scylla_on_first_boot: true

using mimemultipart

If other feature of cloud-init are needed, one can use mimemultipart, and pass a json/yaml with x-scylla/yaml or x-scylla/json

more information on cloud-init multipart user-data:

https://cloudinit.readthedocs.io/en/latest/topics/format.html#mime-multi-part-archive

Content-Type: multipart/mixed; boundary="===============5438789820677534874=="
MIME-Version: 1.0

--===============5438789820677534874==
Content-Type: x-scylla/yaml
MIME-Version: 1.0
Content-Disposition: attachment; filename="scylla_machine_image.yaml"

scylla_yaml:
  cluster_name: test-cluster
  experimental: true
  seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
        - seeds: 10.0.219.209
post_configuration_script: "#! /bin/bash\nyum install cloud-init-cfn"
start_scylla_on_first_boot: true

--===============5438789820677534874==
Content-Type: text/cloud-config; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="cloud-config.txt"

#cloud-config
cloud_final_modules:
- [scripts-user, always]

--===============5438789820677534874==--

example of creating the multipart message by python code:

import json
from email.mime.base import MIMEBase
from email.mime.multipart import MIMEMultipart

msg = MIMEMultipart()

scylla_image_configuration = dict(
    scylla_yaml=dict(
        cluster_name="test_cluster",
        listen_address="10.23.20.1",
        broadcast_rpc_address="10.23.20.1",
        seed_provider=[{
            "class_name": "org.apache.cassandra.locator.SimpleSeedProvider",
            "parameters": [{"seeds": "10.23.20.1"}]}],
    )
)
part = MIMEBase('x-scylla', 'json')
part.set_payload(json.dumps(scylla_image_configuration, indent=4, sort_keys=True))
part.add_header('Content-Disposition', 'attachment; filename="scylla_machine_image.json"')
msg.attach(part)

cloud_config = """
#cloud-config
cloud_final_modules:
- [scripts-user, always]
"""
part = MIMEBase('text', 'cloud-config')
part.set_payload(cloud_config)
part.add_header('Content-Disposition', 'attachment; filename="cloud-config.txt"')
msg.attach(part)

print(msg)

Creating a Scylla cluster using the Machine Image

AWS - CloudFormation

Use template aws/cloudformation/scylla.yaml. Currently, maximum 10 nodes cluster is supported.

Building scylla-machine-image package

RedHat like - RPM

Currently the only supported mode is:

dist/redhat/build_rpm.sh --target centos7 --cloud-provider aws

Build using Docker

docker run -it -v $PWD:/scylla-machine-image -w /scylla-machine-image  --rm centos:7.2.1511 bash -c './dist/redhat/build_rpm.sh -t centos7 -c aws'

Ubuntu - DEB

dist/debian/build_deb.sh

Build using Docker

docker run -it -v $PWD:/scylla-machine-image -w /scylla-machine-image  --rm ubuntu:20.04 bash -c './dist/debian/build_deb.sh'

Building docs

python3 -m venv .venv
source .venv/bin/activate
pip install sphinx sphinx-jsondomain sphinx-markdown-builder
make html
make markdown