Swift-S3 Sync is a way to share data between on-premises OpenStack Swift deployments and Amazon S3 (or S3-clones). The project initially allowed for propagating any changes from Swift to S3 -- PUT, DELETE, or POST -- in an asynchronous fashion. Since then, it has evolved to support a limited set of data policy options to express the lifecycle of the data and transparent access to data stored in S3.
Notable features:
- asynchronously propagates object operations to Amazon S3, Google Cloud Storage¹, S3-clones, and other Swift Clusters
- allows for an "archival" mode after set time period
- on-line access to archived data through the Swift interface
¹Google Cloud Storage requires interoperability access to be enabled.
swift-s3-sync
runs as a standalone process, intended to be used on Swift
container nodes. The container database provides the list of changes to the
objects in Swift (whether it was a metadata update, new object, or a deletion).
To provide on-line access to archived Swift objects, there is a Swift middleware component. If a Swift container was configured to be archived, the middleware will query the destination store for contents on a GET request, as well as splice the results of LIST requests between the two stores.
There is no explicit coordination between the swift-s3-sync
daemons.
Implicitly, they coordinate through their progress in the container database.
Each daemon looks up the number of container nodes in the system (with the
assumption that each node has a running daemon). Initially, each only handles
the objects assigned to it. Afterward, each one verifies that the other objects
have been processed, as well. This means that for each operation, there are
as many requests issued against the remote store as there are container
databases for the container. For example, in a three replica policy, there would
be three HEAD requests if an object PUT was performed (but only one PUT against
the remote store in the common case).
swift-s3-sync
depends on:
- container-crawler library
- botocore (unfortunately, we had to use our own fork, as a number of patches were difficult to merge upstream)
- boto
- eventlet
Until we can merge the boto patches, you will also have to install botocore from
our fork (do this before installing swift-s3-sync):
pip install -e git://github.com/swiftstack/botocore.git@1.4.32.5#egg=botocore
Build the package to be installed on the nodes with:
python ./setup.py build sdist
Install the tarball with:
pip install swift-s3-sync-<version>.tar.gz
You also will need to install the container-crawler
library from Git:
pip install -e git://github.com/swiftstack/container-crawler.git@0.0.9#egg=container-crawler
After that, you should have the swift-s3-sync
executable available in
/usr/local/bin
.
swift-s3-sync
has to be invoked with a configuration file, specifying which
containers to watch, where the contents should be placed, as well as a number of
global settings. A sample configuration file is in the
repository.
To configure the Swift Proxy servers to use swift-s3-sync
to redirect requests
for archived objects, you have to add the following to the proxy pipeline:
[filter:swift_s3_shunt]
use = egg:swift-s3-sync#cloud-shunt
conf_file = <Path to swift-s3-sync config file>
This middleware should be in the pipeline before the DLO/SLO middleware.
To try this out, you can build the docker container used for integration tests:
docker build -t swift-s3-sync test/container
. It uses S3Proxy, backed by the
filesystem and uses a Swift all-in-one docker container as the base.
The container must map the swift-s3-sync source tree when launching:
docker run -P -d -v `pwd`:/swift-s3-sync cloud-sync
After this, we can check the port mappings: docker port <container name>
.
Port 8080 is the Swift Proxy server, whereas 10080 is the S3Proxy.
Tests pre-configure multiple policies.
Specifically, you can create containers sync-s3
and archive-s3
to observe
how swift-s3-sync works. Using python-swiftclient
, that would look something
like this:
swift -A http://localhost:<mapped 8080>/auth/v1.0 -U test:tester -K testing post sync-s3
swift -A http://localhost:<mapped 8080>/auth/v1.0 -U test:tester -K testing post archive-s3
swift -A http://localhost:<mapped 8080>/auth/v1.0 -U test:tester -K testing put sync-s3 <file>
swift -A http://localhost:<mapped 8080>/auth/v1.0 -U test:tester -K testing put archive-s3 <file>
After this, we can examine the state of our fake S3 content:
s3cmd ls -r s3://s3-sync-stest/
You should see two objects in the bucket. For s3cmd, you can use the following configuration:
access_key=s3-sync-test
secret_key=s3-sync-test
host_base=localhost:<mapped 10080>
host_bucket=localhost:<mapped 10080>
All commands below assume you're running them in the swift-s3-sync directory.
It is recommended to setup virtualenv when working on this project. You can set
it up as follows: virtualenv venv
. After that, dependencies can be setup with:
./venv/bin/pip install -r requirements.txt
. The test dependencies can be
similarly setup through:
./venv/bin/pip install -r requirements-test.txt
.
To run unit tests, you can simply run nose
: ./venv/bin/nosetests
. The unit
tests will require swift and container-crawler to be in your
PYTHONPATH
. Typically, I run them with the following convoluted line:
PYTHONPATH=~/swift:~/container-crawler ./venv/bin/nosetests
.
PYTHONPATH=~/swift:~/container-crawler ./venv/bin/nosetests
.
This assumes that all of the dependencies are in your home directory. You can
adjust this if they live in other places.
For integration tests, we need access to a Swift cluster and some sort of an S3 provider. Currently, the tests use a Docker container to provide Swift and are configured to talk to S3Proxy.
To build the test container, run:
docker build -t cloud-sync/test test/container
Once this completes, you will have a docker container tagged cloud-sync/test
.
Start the container with:
docker run -P -d -v <swift-s3-sync checkout>:/swift-s3-sync cloud-sync/test
.
The container will be started in the background (-d
) and will expose ports
8080 and 10080 (-P
) to connect to Swift and S3Proxy, respectively. It is based
on the
bouncestorage/swift-aio.
The -v
option maps the current source tree into the container, so that it
operates on your current state.
NOTE: the services do not restart on code changes. You can either manually stop/start the swift-s3-sync daemon (and Swift proxy if you're working on the shunt), or stop/start the contianer.
The cloud sync configuration for the tests is defined in
test/container/swift-s3-sync.conf
. In particular, there are mappings for S3
sync and archive policies and the same for Swift. The S3 mappings point to
S3Proxy running on the host machine, listening on port 10080.
Once you have S3Proxy and the Docker container running, run the tests with:
./venv/bin/nosetests test/integration
By default tests will look for the first running container that has been started
from an image named cloud-sync/test
. You can override that behavior by
specifying the test container to use with the TEST_CONTAINER
environment
variable.
The tests create and destroy the containers and buckets configured in the
swift-s3-sync.conf
file. If you need to examine the state of a container,
consider commenting out the tearDownClass
method to be a NOOP (TODO: add a way
to keep state).
If you would like to examine the logs from each of the services, all logs are in /var/log (e.g. /var/log/swift-s3-sync.log).