SDSC "img-storage" roll
Overview
This roll bundles the packages for managing VM images in ROCKS cluster.
It includes:
- http://www.rabbitmq.com/ is a messaging broker implementing AMQP protocol and providing message exchange between cluster components.
- Pika is a library for communication with RabbitMQ from Python.
- pythondaemon, lockfile - should be provided by base roll
- rocks-command-imagestorage - set of rocks commands
- img-storage-nas - the NAS daemon
- img-storage-vm - the Client node daemon
- img-storage - the common library containing the code for all daemons
For more information you can read the user guide at: http://img-storage.readthedocs.org/en/latest/
Requirements
To build/install this roll you must have root access to a Rocks development machine (e.g., a frontend or development appliance).
If your Rocks development machine does not have Internet access you must
download the appropriate img-storage source file(s) using a machine that does
have Internet access and copy them into the src/<package>
directories on your
Rocks development machine.
This roll requires the full OS roll installed on the machine. KVM roll, rabbitmq roll and zfs-linux are also required.
Building
To build the img-storage-roll, execute these instructions on a Rocks development machine (e.g., a frontend or development appliance):
% make default 2>&1 | tee build.log
% grep "RPM build error" build.log
If nothing is returned from the grep command then the roll should have been
created as... img-storage-*.iso
. If you built the roll on a Rocks frontend then
proceed to the installation step. If you built the roll on a Rocks development
appliance you need to copy the roll to your Rocks frontend before continuing
with installation.
Installation
To install, execute these instructions on a Rocks frontend:
% rocks add roll *.iso
% rocks enable roll img-storage
% cd /export/rocks/install
% rocks create distro
% rocks run roll img-storage | bash
Note: images sync
We're using ZFS zvols to store VM images. The default mechanism for sending those between nodes is SSH. We experienced a bug in HPN-SSH when trying to send data while running a VM on the same node, which was causing a buffer overflow and termination of data transfer process. As a fix send and receive scripts are using bbcp (https://www.slac.stanford.edu/~abh/bbcp/) to send and receive data if one is found on NAS, and SSH otherwise. The scripts are trying to find bbcp in PATH and in /opt/bbcp/bin/bbcp (default location for COMET production cluster).
You can find the mentioned scripts at src/img-storage-nas/bin/snapshot_*.sh.
The RabbitMQ roll makes service quit if there's no connection for some time. This allows better handling the problems. Refer to RabbitMQ roll docs for more info.