This is a series of tools for Chaos Toolkit (CTK) which can perform targeted experiments on applications deployed in Cloud Foundry.
- Block general network traffic
- Block all incoming traffic to the application
- Block all outgoing traffic from the application
- Block service traffic
- Auto-detection of bound services and support for manually specified non-bound services
- Block all outgoing traffic from the application to one or more bound services
- Block all incoming traffic form the application to one or more bound services
- Manipulate all network traffic from an application (including to its services)
- Latency
- Packet loss
- Packet duplication
- Packet corruption
- Impose bandwidth restrictions
- Application download bandwidth shaping (using queuing)
- Application upload bandwidth limiting (using policing)
- Perform network speedtest from within hosting containers
- Crash one or more random application instances
- Kill/start monit processes on hosting diego-cells
It is recommended that you run Monarch with Docker which you can get here. We have had some issues with cross-platform support for the underlying CLIs.
With docker up and running, run the following within the root of the git repository:
# FIRST Run
docker build -t monarch .
docker run -it \
--name monarch \
-v C:\Users\<username>\Documents\monarch\config:/monarch/config # and create the needed files from within.
monarch
# Subsequent Runs
docker start -ai monarch
# Rebuild Image (You will loose information not in an attached volume)
docker container rm monarch
yes | docker image prune
# goto FIRST Run ;)
Note that the config volume is optional and does not need to be mounted, however, even if you do not have any written already, you should mount the volumes to prevent data loss when you destroy the container during the inevitable upgrade process. Also, if you plan to run tests, make sure to mount the testing config volume as well!
From within the docker image, you may now use either the python shell to interact with monarch, or chaostollkit which is installed automatically when the image is built. You will need to login with cf-cli and bosh-cli before attempting to use monarch.
To be used from your experiment, this package must first be installed in the Python environment where
chaostoolkit already exists. This package requires at least
Python version 3.5 (3.6 if using the chaostoolkit interfaces directly), so translate python
as python3
or pyhton3.5
as appropriate for your OS.
From within the source, run:
sudo python setup.py install
Or to install for just your user:
python setup.py install --user
Now you should be able to import the package.
import monarch
print(monarch.__version__)
In order to run the script, it will require that you have the Cloud Foundry CLI installed and the BOSH CLI installed. You will also need to be logged in to the Cloud Foundry CLI as a user with permission to access all apps which are to be targeted and logged in as and you will need to be login as an admin with the BOSH CLI. This is because the script requires ssh access to the bosh vms to make its changes and also prevents applications from needing SSH enabled.
Once the CLIs are ready, create (or modify the existing) configuration file. This file is only necessary for CLI use as it is included within the experiments for Chaos Toolkit.
bosh
: Information about the bosh cli and environmentcmd
: The bosh-cli command.env
: The environment name for the cf deployment (-e env
).cf-dep
: The cloud foundry deployment in the bosh environment.cfdot-dc
: The diego-cell to use forcfdot
queries.
cf
: Information about the cf cli and environmentcmd
: The cf-cli command.
container-port-whitelist
: List of node ports which should be ignored. These are the external ports on the diego-cells.service-whitelist
: List of service types which should be ignored. These must be the names displayed in the cf-cli marketplace.quantum
: The quantum to use when configuring qdisc perturbance. The recommended6000
should work without issue.
Sample config.yml or cfg
values for Chaos Toolkit.
bosh:
cmd: bosh2 # bosh CLI to be used
env: bosh-lite # environment alias name or address
cf-dep: cf # Bosh deployment name
cfdot-dc: diego_cell/0
credentials: # Optional; CLI will need to be logged in already if not present
user: iamaperson
pswd: ideallysomethingsecure
cacert: | # include as needed
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
cf:
cmd: cf
credentials: # Optional; CLI will need to be logged in already if not present
user: iamaperson
pswd: hopefullysomethingdifferent
api: cf.example.com
skip_ssl_validation: true # add as needed (false by default)
container-port-whitelist:
- 22
- 2222
host-port-whitelist: []
service-whitelist:
- logger
quantum: 6000
#services: # custom service definitions, not needed for bound services
# - name: google
# host: google.com
# ports:
# - ['tcp', 80]
# - ['tcp', 443]
# - ['icmp', 'all']
There are two ways you can call these scripts. The first is the Python Shell which will allow you to manually block
services or applications and then unblock them at your leisure. The second is through the actions
and probes
which
should be called by Chaos Toolkit.
If you have not installed the monarch
package, then make sure you run Chaos Toolkit from this directory (the root of
this repository) using python -m chaostoolkit run exp.json
or else the monarch
module will not be found. Otherwise
just use chaos run exp.json
from any directory.
Currently, the Chaos Toolkit interface does not support saving information about what was targeted, which should be okay
for the time being as we have yet to observe Cloud Foundry moving app instances as a result of any of these actions.
Though it is a good reason to be cautious of its use as it simply re-queries again when unblocking, so if something did
move, it will not remove the old rule in the location the app is no longer at. If you need to manually verify that all
of the rules have been removed, you can go through each diego-cell in the Cloud Foundry deployment and run
iptables -L | grep DROP
to see if any rules are lingering. (This script should be the only source of DROP
rules).
The following is a sample, Chaos-Toolkit experiment file to block all traffic to the application.
{
"version": "0.1.0",
"title": "Blocking spring-music makes it unreachable.",
"description": "This is a testing experiment to verify the script's block traffic function works.",
"tags": ["cloudfoundry", "bosh", "springboot"],
"configuration": {
"TODO": "Some of this needs to be part of the application configuration since the user of this would not know what the cli commands are for instance.",
"bosh": {
"cmd": "bosh2",
"env": "tt-stg02",
"cf-dep": "cf-da0ba81cb255ad93a508",
"cfdot-dc": "diego_cell/0"
},
"cf": {
"cmd": "cf"
},
"container-port-whitelist": [22, 2222],
"host-port-whitelist": [],
"service-whitelist": ["T-Logger"],
"quantum": 6000
},
"steady-state-hypothesis": {
"title": "We can access the application and other neighboring applications (This should fail because we block all traffic)",
"probes": [
{
"type": "probe",
"name": "spring-music-responds",
"tolerance": 200,
"provider": {
"type": "http",
"url": "http://spring-music-interested-bonobo.apps.tt-stg02.cf.t-mobile.com/"
}
},
{
"type": "probe",
"name": "spring-music2-responds",
"tolerance": 200,
"provider": {
"type": "http",
"url": "http://spring-music2-lean-sable.apps.tt-stg02.cf.t-mobile.com/"
}
}
]
},
"method": [
{
"type": "action",
"name": "block-traffic",
"provider": {
"type": "python",
"module": "monarch.pcf.actions",
"func": "block_traffic",
"arguments": {
"org": "sys-tmo",
"space": "test",
"appname": "spring-music"
}
}
}
],
"rollbacks": [
{
"type": "action",
"name": "unblock-traffic",
"provider": {
"type": "python",
"module": "monarch.pcf.actions",
"func": "unblock_traffic",
"arguments": {
"org": "sys-tmo",
"space": "test",
"appname": "spring-music"
}
}
}
]
}
For now, there is no CLI interface, instead use an interactive python shell session. See bleow.
Example session:
from monarch.pcf.config import Config
from monarch.pcf.app import App
Config().load_yaml('config/tt-stg02.yml')
app = App.discover('sys-tmo', 'ce-service-registry', 'spring-music')
app.block()
app.unblock()
app.crash_random_instance(2) # will require that you rediscover the app once CF brings a new container up
app = App.discover('sys-tmo', 'ce-service-registry', 'spring-music')
app.block_services('musicdb')
app.unblock_services()
Unit tests are written with pytest and can be run with ./setup.py test
. Before running the tests, you will need to add
tests/config/app_test.yml
which is the same as the above configuration with the following appended:
# ...
testing:
org: coolkids
space: ce-testing
appname: spring-music
push-app: true
db-market-name: p-mysql
db-plan: 100mb
db-instance-name: musicdb
You will also need to include the credential sections for the bosh and cf cli configs. If push-app is true, it will
expect the org
and space
to be pre-existing, but deploy spring-music from scratch (meaning db-instance-name
and
appname
should not already exist). It will perform cleanup after tests are done leaving the space in the state it was
originally.
These tests can be run from within docker using
Monarch is open-sourced under the terms of section 7 of the Apache 2.0 license and is released AS-IS WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND.