/perftest

Automate (OpenVPN) performance tests on Amazon EC2.

Primary LanguagePythonOtherNOASSERTION

INTRODUCTION
------------

This is a small python application that attempts to automate deployments of 
a large number of (temporary) virtual machines. It's main emphasis is 
setting up a large number of (OpenVPN) client computers for performance testing 
purposes. However, codebase is general purpose enough to be used for any 
purposes requiring a large number of temporary VMs.

Currently Amazon EC2 provider is supported, and the support for a "local" 
provider has been implemented but is broken atm due to architectural changes. 
Fixing it is trivial, though.

To fetch the latest source code, go to

<https://github.com/mattock/perftest>

Enhancement requests, bug reports, patches and such can be handled from there as 
well.


ARCHITECTURE
------------

Currently there are currently many components:

1) Provider/poller/queuer thread(s)

This type of thread may create the temporary VMs. It may also poll for VMs that 
have reached a state where they can be configured. Whenever a VM becomes ready, 
it is placed into a queue, from where configurer threads can find it.

There is usually only one of these threads.

2) Configurer threads

These threads take IPs of active VMs from the queue generated by the provider 
thread. They then launch one Fabric process per VM. Fabric takes care of 
configuring the VMs. When it exits, the configurer thread marks that IP as done 
and removes it from the queue.

There are usually many of these threads, as old versions of Fabric (pre-1.3) did 
not support multithreaded/multiprocess operation.

3) Main program

The main program (start.py) is responsible for reading configuration files and 
parsing command-line arguments, and for launching the appropriate number of 
threads.

4) Fabfile

This program utilizes a standard fabfile understood Fabric's command-line tool 
"fab". Use of a recent Fabric version (1.2.2+) is strongly recommended.

5) Configuration files

Some aspects of this tool are controlled by configuration files in the "config" 
directory:

- ec2.conf: various (sensitive) Amazon EC2-specific settings
- cron.conf: cron-specific settings, use to generate a valid crontab
- tests.conf: test control data (not the actual test scripts)
- ssh.conf: ssh configuration details (saves typing)

6) Resources

The "resources" directory contains all files Fabric uploads to the clients. 
These files can be static (e.g. OpenVPN certificates) or dynamically generated 
(e.g. the crontab file used for timing the tests).

7) Analysis scripts

The "analyze.sh" script drives the awk scripts which at the moment process iperf 
and dstat logs and generate required output (e.g. Trac/Mediawiki tables, CSV).

TEST PROCESS
------------

At the moment the tests require quite a lot of manual work, regardless of all 
the automation that's implemented:

1) Launch and configure a server instance (start.py or manually)
2) Launch and configure a number of client instances (start.py)
3) Launch "iperf -s" on the server
4) Launch "dstat.sh" on the server
5) Wait for the tests to run
6) Fetch log files from server and clients (start.py/scp)
7) Move the log files to a new subdirectory in logs/
8) Open dstat.csv in a spreadsheet, creating a new spreadsheet for each test
   segment, so that each spreadsheet contains only those time periods where
   server was under load(*). Make sure that the language for all cells is set
   to English (USA), so that "." is used a the decimal separator.
9) Run analyze.sh with correct arguments
10) Paste the analysis files (logs/logname/analysis-*) to wherever you wish

(*) This is a crude and time-consuming method, but algorithmicly detecting the 
test case borders is error-prone and easily skews the results.

DEPENDENCIES
------------

This application depends on the following external libraries and tools:

Boto (http://code.google.com/p/boto)

Python interface to Amazon Web Services. Not necessary if not using the Amazon 
EC2 provider/poller/queuer. Tested on version 1.9 on Ubuntu 11.04.

Fabric (http://fabfile.org)

Fabric is a Python (2.5 or higher) library and command-line tool for 
streamlining the use of SSH for application deployment or systems administration 
tasks. Tested on version 0.9.3 on Ubuntu 11.04.


LIMITATIONS
-----------

At the moment the configurer threads choke on input requests (as standard Fabric 
would). Also, Fabric output is even more confusing than normally, given that 
multiple threads write to the same terminal at the same time. There is no clean 
way to fix this, except by removing non-fatal logging and/or writing to logfiles 
instead of stdout.


USAGE
-----

Edit the configuration files in "config" directory to your liking. The 
"tests.conf" requires some discussion. Each section header must be accompanied 
by a test script with the same name in "resources". For example, if test is 
called "test1", you must create a test script called "resources/test1" or Fabric 
will bail out.

The "time" variable defines how many minutes from the launch of start.py 
it takes before the test will be triggered (using cron). The "server" variable 
determines the server against which the test is performed.

You almost certainly want to edit the fabfile.py, which drives the client 
deployment, as well as write custom test scripts.

In addition, you need to use the appropriate command-line flags; to see them 
type

$ python start.py -h


TODO
----

- Reimplement full "local" provider
- Modify for new Fabric versions (1.3+) with multiprocessing support
- Make log analysis less painful