This locust README may be used as a starting point for writing a Service Guide for your DC/OS Service.
In particular, the parts in ALL-CAPS ITALICS should be updated to reflect your service.
Many sections are left unfilled, as they depend on how your service works. For example, we leave empty sections for you to describe how users may Backup and Restore their data because any persistent service should have a backup option.
- Overview
- Quick Start
- Installing and Customizing
- Service Settings
- Service Name
- SERVICE-WIDE OPTIONS SPECIFIC TO YOUR PRODUCT INTEGRATION
- Node Settings
- Node Count
- CPU
- Memory
- Ports
- Storage Volumes
- Placement Constraints
- PER-NODE OPTIONS SPECIFIC TO YOUR PRODUCT INTEGRATION
- Service Settings
- Uninstalling
- Connecting Clients
- Managing
- Updating Configuration
- Adding a Node
- Resizing a Node
- Updating Placement Constraints
- SERVICE-WIDE OPTIONS SPECIFIC TO YOUR PRODUCT INTEGRATION
- PER-NODE OPTIONS SPECIFIC TO YOUR PRODUCT INTEGRATION
- Restarting a Node
- Replacing a Node
- MAINTAINENCE OPERATIONS SPECIFIC TO YOUR PRODUCT INTEGRATION
- Updating Configuration
- Disaster Recovery
- BACKUP OPTIONS SPECIFIC TO YOUR PRODUCT INTEGRATION
- RESTORE OPTIONS SPECIFIC TO YOUR PRODUCT INTEGRATION
- Troubleshooting
- Limitations
- Removing a Node
- Updating Storage Volumes
- Rack-aware Replication
- CAVEATS SPECIFIC TO YOUR PRODUCT INTEGRATION
- Support
- Changelog
DC/OS SERVICENAME is an automated service that makes it easy to deploy and manage SERVICENAME on DC/OS.
BRIEF OVERVIEW OF YOUR PRODUCT
- Single command installation for rapid provisioning
- CLI for easy management
- Multiple SERVICENAME clusters sharing a single DC/OS cluster for multi-tenancy
- Multiple SERVICENAME instances sharing the same hosts for improved utilization
- Placement constraints for fine-grained instance placement
- Vertical and horizontal for managing capacity
- Rolling software and configuration updates for runtime maintainence
- Integrated with Enterprise DC/OS Storage capabilities
- Integrated with Enterprise DC/OS Networking capabilities
- Integrated with Enterprise DC/OS Monitoring and Troubleshooting capabilities
- Integrated with Enterprise DC/OS Security capabilities
- OTHER BENEFITS YOUR PRODUCT WITH DC/OS
-
Install DC/OS on your cluster. See the documentation for instructions.
-
If you are using open source DC/OS, install SERVICENAME cluster with the following command from the DC/OS CLI. If you are using Enterprise DC/OS, you may need to follow additional instructions. See the Install and Customize section for more information.
dcos package install _PKGNAME_
You can also install SERVICENAME from the DC/OS web interface.
-
The service will now deploy with a default configuration. You can monitor its deployment from the Services tab of the DC/OS web interface.
-
Connect a client to SERVICENAME.
dcos _PKGNAME_ endpoints [ "_LIST_", "_OF_", "_ENDPOINTS_" ] dcos _PKGNAME_ endpoints _ENDPOINT_ { "address": ["10.0.3.156:_PORT_", "10.0.3.84:_PORT_"], "dns": ["_POD_-0._PKGNAME_.mesos:_PORT_", "_POD_-1._PKGNAME_.mesos:_PORT_", "_POD_-2._PKGNAME_.mesos:_PORT_] }
-
SIMPLE EXAMPLE OF HOW TO CONNECT A CLIENT AND INTERACT WITH YOUR PRODUCT (E.G., WRITE DATE, READ DATA).
The default SERVICENAME installation provides reasonable defaults for trying out the service, but may not be sufficient for production use. You may require different configurations depending on the context of the deployment.
- If you are using Enterprise DC/OS, you may need to provision a service account before installing SERVICENAME. Only someone with
superuser
permission can create the service account.strict
security mode requires a service account.permissive
security mode a service account is optional.disabled
security mode does not require a service account.
- Your cluster must have at least NUMBER private nodes.
To start a basic test cluster, run the following command on the DC/OS CLI. Enterprise DC/OS users must follow additional instructions. More information about installing SERVICENAME on Enterprise DC/OS.
dcos package install _PKGNAME_
You can specify a custom configuration in an options.json
file and pass it to dcos package install
using the --options
parameter.
$ dcos package install _PKGNAME_ --options=your-options.json
For more information about building the options.json file, see the DC/OS documentation for service configuration access.
You can install SERVICENAME from the DC/OS web interface. If you install SERVICENAME from the web interface, you must install the SERVICENAME DC/OS CLI subcommands separately. From the DC/OS CLI, enter:
dcos package install _SERVICENAME_ --cli
Choose ADVANCED INSTALLATION
to perform a custom installation.
Each instance of SERVICENAME in a given DC/OS cluster must be configured with a different service name. You can configure the service name in the service section of the advanced installation section of the DC/OS web interface. The default service name (used in many examples here) is PKGNAME
.
CREATE ONE OR MORE SECTIONS FOR ADDITIONAL SERVICE-WIDE CUSTOMIZATIONS THAT YOU EXPOSE.
E.G., THIS MAY INCLUDE OPTIONAL FEATURES THAT MAY BE ENABLED/DISABLED BY A USER.
Adjust the following settings to customize the amount of resources allocated to each node. SERVICENAME's SYSTEM REQUIREMENTS must be taken into consideration when adjusting these values. Reducing these values below those requirements may result in adverse performance and/or failures while using the service.
Each of the following settings can be customized under the node configuration section.
Customize the Node Count
setting (default DEFAULT NODE COUNT) under the node configuration section. Consult SERVICENAME documentation for minimum node count requirements.
You can customize the amount of CPU allocated to each node. A value of 1.0
equates to one full CPU core on a machine. Change this value by editing the cpus value under the node configuration section. Turning this too low will result in throttled tasks.
You can customize the amount of RAM allocated to each node. Change this value by editing the mem value (in MB) under the node configuration section. Turning this too low will result in out of memory errors.
ANY CUSTOMIZATIONS RELATING TO MEMORY THAT SHOULD BE ADJUSTED AS WELL (E.G. HEAP SIZE)? IF SO, MENTION THEM HERE.
You can customize the ports exposed by the service via the service configuratiton. If you wish to install multiple instances of the service and have them colocate on the same machines, you must ensure that no ports are common between those instances. Customizing ports is only needed if you require multiple instances sharing a single machine. This customization is optional otherwise.
Each component's ports may be customized in the following configuration sections:
- LIST PORT OPTIONS AND WHERE THEY ARE LOCATED IN THE CONFIG HERE
The service supports two volume types:
ROOT
volumes are effectively an isolated directory on the root volume, sharing IO/spindles with the rest of the host system.MOUNT
volumes are a dedicated device or partition on a separate volume, with dedicated IO/spindles.
Using MOUNT
volumes requires additional configuration on each DC/OS agent system, so the service currently uses ROOT
volumes by default. To ensure reliable and consistent performance in a production environment, you should configure MOUNT
volumes on the machines that will run the service in your cluster and then configure the following as MOUNT
volumes:
- LIST ANY VOLUMES THAT SHOULD USE DEDICATED SPINDLES IN A PRODUCTION ENVIRONMENT FOR YOUR SERVICE
Placement constraints allow you to customize where the service is deployed in the DC/OS cluster. Placement constraints may be configured SEPARATELY FOR EACH NODE TYPE? (IF YOUR SERVICE HAS MULTIPLE TYPES) in the following configuration sections:
- LIST EXPOSED PLACEMENT CONSTRAINT FIELDS AND WHERE THEY ARE LOCATED IN THE CONFIG HERE
Placement constraints support all Marathon operators with this syntax: field:OPERATOR[:parameter]
. For example, if the reference lists [["hostname", "UNIQUE"]]
, use hostname:UNIQUE
.
A common task is to specify a list of whitelisted systems to deploy to. To achieve this, use the following syntax for the placement constraint:
hostname:LIKE:10.0.0.159|10.0.1.202|10.0.3.3
You must include spare capacity in this list, so that if one of the whitelisted systems goes down, there is still enough room to repair your service without that system.
For an example of updating placement constraints, see Managing below.
CREATE ONE OR MORE SECTIONS FOR ADDITIONAL PER-NODE CUSTOMIZATIONS THAT YOU EXPOSE. E.G., CUSTOMIZATION OF EXPOSED CONFIG FILE OPTIONS.
E.G., IF YOUR SERVICE SUPPORTS ENABLING/DISABLING CERTAIN COMPONENTS, THIS MAY BE A GOOD PLACE TO PROVIDE TUTORIALS ON HOW TO CONFIGURE THEM SUCCESSFULLY
Follow these steps to uninstall the service.
- Uninstall the service. From the DC/OS CLI, enter
dcos package uninstall
. - Clean up remaining reserved resources with the framework cleaner script,
janitor.py
. More information about the framework cleaner script.
To uninstall an instance named _PKGNAME_
(the default), run:
$ MY_SERVICE_NAME=_PKGNAME_
$ dcos package uninstall --app-id=$MY_SERVICE_NAME _PKGNAME_
$ dcos node ssh --master-proxy --leader "docker run mesosphere/janitor /janitor.py \
-r $MY_SERVICE_NAME-role \
-p $MY_SERVICE_NAME-principal \
-z dcos-service-$MY_SERVICE_NAME"
One of the benefits of running containerized services is that they can be placed anywhere in the cluster. Because they can be deployed anywhere on the cluster, clients need a way to find the service. This is where service discovery comes in.
Once the service is running, you may view information about its endpoints via either of the following methods:
- CLI:
- List endpoint types:
dcos _PKGNAME_ endpoints
- View endpoints for an endpoint type:
dcos _PKGNAME_ endpoints <endpoint>
- List endpoint types:
- Web:
- List endpoint types:
<dcos-url>/service/_PKGNAME_/v1/endpoints
- View endpoints for an endpoint type:
<dcos-url>/service/_PKGNAME_/v1/endpoints/<endpoint>
- List endpoint types:
Returned endpoints will include the following:
.mesos
hostnames for each instance that will follow them if they're moved within the DC/OS cluster.- A HA-enabled VIP hostname for accessing any of the instances (optional).
- A direct IP address for accesssing the service if
.mesos
hostnames are not resolvable.
In general, the .mesos
endpoints will only work from within the same DC/OS cluster. From outside the cluster you can either use the direct IPs or set up a proxy service that acts as a frontend to your SERVICENAME instance. For development and testing purposes, you can use DC/OS Tunnel to access services from outside the cluster, but this option is not suitable for production use.
GIVEN A RELEVANT EXAMPLE CLIENT FOR YOUR SERVICE, PROVIDE INSTRUCTIONS FOR CONNECTING THAT CLIENT USING THE ENDPOINTS LISTED ABOVE. WE RECOMMEND USING THE .MESOS ENDPOINTS IN YOUR EXAMPLE AS THEY WILL FOLLOW TASKS IF THEY ARE MOVED WITHIN THE CLUSTER.
You can make changes to the service after it has been launched. Configuration management is handled by the scheduler process, which in turn handles deploying SERVICENAME itself.
Edit the runtime environment of the scheduler to make configuration changes. After making a change, the scheduler will be restarted and automatically deploy any detected changes to the service, one node at a time. For example, a given change will first be applied to _NODEPOD_-0
, then _NODEPOD_-1
, and so on.
Nodes are configured with a "Readiness check" to ensure that the underlying service appears to be in a healthy state before continuing with applying a given change to the next node in the sequence. However, this basic check is not foolproof and reasonable care should be taken to ensure that a given configuration change will not negatively affect the behavior of the service.
Some changes, such as decreasing the number of nodes or changing volume requirements, are not supported after initial deployment. See Limitations.
To make configuration changes via scheduler environment updates, perform the following steps:
- Visit to access the DC/OS web interface.
- Navigate to
Services
and click on the service to be configured (defaultPKGNAME
). - Click
Edit
in the upper right. On DC/OS 1.9.x, theEdit
button is in a menu made up of three dots. - Navigate to
Environment
(orEnvironment variables
) and search for the option to be updated. - Update the option value and click
Review and run
(orDeploy changes
). - The Scheduler process will be restarted with the new configuration and will validate any detected changes.
- If the detected changes pass validation, the relaunched Scheduler will deploy the changes by sequentially relaunching affected tasks as described above.
To see a full listing of available options, run dcos package describe --config _PKGNAME_
in the CLI, or browse the SERVICE NAME install dialog in the DC/OS web interface.
The service deploys DEFAULT NODE COUNT nodes by default. You can customize this value at initial deployment or after the cluster is already running. Shrinking the cluster is not supported.
Modify the NODE_COUNT
environment variable to update the node count. If you decrease this value, the scheduler will prevent the configuration change until it is reverted back to its original value or larger.
The CPU and Memory requirements of each node can be increased or decreased as follows:
- CPU (1.0 = 1 core):
NODE_CPUS
- Memory (in MB):
NODE_MEM
MENTION ANY OTHER ENVVARS THAT SHOULD BE ADJUSTED ALONG WITH THE MEMORY ENVVAR HERE?
Note: Volume requirements (type and/or size) cannot be changed after initial deployment.
Placement constraints can be updated after initial deployment using the following procedure. See Service Settings above for more information on placement constraints.
Let's say we have the following deployment of our nodes
- Placement constraint of:
hostname:LIKE:10.0.10.3|10.0.10.8|10.0.10.26|10.0.10.28|10.0.10.84
- Tasks:
10.0.10.3: _NODEPOD_-0
10.0.10.8: _NODEPOD_-1
10.0.10.26: _NODEPOD_-2
10.0.10.28: empty
10.0.10.84: empty
10.0.10.8
is being decommissioned and we should move away from it. Steps:
-
Remove the decommissioned IP and add a new IP to the placement rule whitelist by editing
NODE_PLACEMENT
:hostname:LIKE:10.0.10.3|10.0.10.26|10.0.10.28|10.0.10.84|10.0.10.123
-
Redeploy
_NODEPOD_-1
from the decommissioned node to somewhere within the new whitelist:dcos _PKGNAME_ pods replace _NODEPOD_-1
-
Wait for
_NODEPOD_-1
to be up and healthy before continuing with any other replacement operations.
ADD ONE OR MORE SECTIONS HERE TO DESCRIBE RE-CONFIGURATION OF HIGHLIGHTED SERVICE-WIDE OPTIONS EXPOSED BY YOUR PRODUCT INTEGRATION
ADD ONE OR MORE SECTIONS HERE TO DESCRIBE RE-CONFIGURATION OF HIGHLIGHTED NODE-SPECIFIC OPTIONS THAT YOUR SERVICE EXPOSES
This operation will restart a node while keeping it at its current location and with its current persistent volume data. This may be thought of as similar to restarting a system process, but it also deletes any data that is not on a persistent volume.
- Run
dcos _PKGNAME_ pods restart _NODEPOD_-<NUM>
, e.g._NODEPOD_-2
.
This operation will move a node to a new system and will discard the persistent volumes at the prior system to be rebuilt at the new system. Perform this operation if a given system is about to be offlined or has already been offlined.
Note: Nodes are not moved automatically. You must perform the following steps manually to move nodes to new systems. You canbuild your own automation to perform node replacement automatically according to your own preferences.
- ANY STEPS TO WIND DOWN A NODE BEFORE IT'S WIPED/DECOMMISSIONED GO HERE
- Run
dcos _PKGNAME_ pods replace _NODEPOD_-<NUM>
to halt the current instance (if still running) and launch a new instance elsewhere.
For example, let's say _NODEPOD_-3
's host system has died and _NODEPOD_-3
needs to be moved.
- DETAILED INSTRUCTIONS FOR WINDING DOWN A NODE, IF NEEDED FOR YOUR SERVICE, GO HERE
- "NOW THAT THE NODE HAS BEEN DECOMMISSIONED," (IF NEEDED BY YOUR SERVICE) start
_NODEPOD_-3
at a new location in the cluster.$ dcos _PKGNAME_ pods replace _NODEPOD_-3
INSTRUCTIONS FOR BACKING UP DATA FROM YOUR SERVICE.
INSTRUCTIONS FOR RESTORING BACKED UP DATA TO YOUR SERVICE.
Logs for the scheduler and all service nodes can be viewed from the DC/OS web interface.
- Scheduler logs are useful for determining why a node isn't being launched (this is under the purview of the Scheduler).
- Node logs are useful for examining problems in the service itself.
In all cases, logs are generally piped to files named stdout
and/or stderr
.
To view logs for a given node, perform the following steps:
- Visit to access the DC/OS web interface.
- Navigate to
Services
and click on the service to be examined (defaultPKGNAME
). - In the list of tasks for the service, click on the task to be examined (scheduler is named after the service, nodes are each
_NODEPOD_-#-node
). - In the task details, click on the
Logs
tab to go into the log viewer. By default, you will seestdout
, butstderr
is also useful. Use the pull-down in the upper right to select the file to be examined.
You can also access the logs via the Mesos UI:
- Visit /mesos to view the Mesos UI.
- Click the
Frameworks
tab in the upper left to get a list of services running in the cluster. - Navigate into the correct framework for your needs. The scheduler runs under
marathon
with a task name matching the service name (defaultPKGNAME
). Service nodes run under a framework whose name matches the service name (defaultPKGNAME
). - You should now see two lists of tasks.
Active Tasks
are tasks currently running, andCompleted Tasks
are tasks that have exited. Click theSandbox
link for the task you wish to examine. - The
Sandbox
view will list files namedstdout
andstderr
. Click the file names to view the files in the browser, or clickDownload
to download them to your system for local examination. Note that very old tasks will have their Sandbox automatically deleted to limit disk space usage.
INSTRUCTIONS FOR ACCESSING METRICS.
MANAGE CUSTOMER EXPECTIONS BY DISCLOSING ANY FEATURES OF YOUR PRODUCT THAT ARE NOT SUPPORTED WITH DC/OS, FEATURES MISSING FROM THE DC/OS INTEGRATION, ETC.
Removing a node is not supported at this time.
Neither volume type nor volume size requirements may be changed after initial deployment.
Rack placement and awareness are not supported at this time.
- SERVICENAME: WHAT VERSION OF YOUR SERVICE IS INCLUDED IN THE PACKAGE?
- DC/OS: LIST VERSION(S) OF DC/OS THAT YOU'VE TESTED AND SUPPORT
Packages are versioned with an a.b.c-x.y.z
format, where a.b.c
is the version of the DC/OS integrtion and x.y.z
indicates the version of SERVICENAME. For example, 1.5.0-3.2.1
indicates version 1.5.0
of the DC/OS integrtion and version 3.2.1
of SERVICENAME.