Blueflood

Introduction

Blueflood is a multi-tenant distributed metric processing system created by engineers at Rackspace. It is used in production by the Cloud Monitoring team to process metrics generated by their monitoring systems. Blueflood is capable of ingesting, rolling up and serving metrics at a massive scale.

This presentation given to the SF Metrics Meetup group in Feb. 2014 is a good video introduction to Blueflood: http://vimeo.com/87210602

Getting Started

The latest code will always be here on Github.

git clone https://github.com/rackerlabs/blueflood.git
cd blueflood

You can run the entire suite of tests using Maven:

mvn test integration-test

Building

Build an 'uber jar' using maven:

mvn package -P all-modules

The uber jar will be found in ${BLUEFLOOD_DIR}/blueflood-all/target/blueflood-all-${VERSION}-jar-with-dependencies.jar. This jar contains all the dependencies necessary to run Blueflood with a very simple classpath.

Running

The best place to start is the 10 minute guide. In a nutshell, you must do this:

java -cp /path/to/uber.jar \
-Dblueflood.config=file:///path/to/blueflood.conf \
-Dlog4j.configuration=file:///path/to/log4j.properties \
com.rackspacecloud.blueflood.service.BluefloodServiceStarter

Each configuration option can be found in Configuration.java. Each of those can be overridden on the command line by doing:

-DCONFIG_OPTION=NEW_VALUE

Development

We anticipate different use cases for Blueflood. For example, at Rackspace it made more sense to create a Thrift layer for ingestion and query. We have chosen not to release that layer because it contains a lot of code that is specific to our infrastructure and other backend systems.

We decided to release Blueflood with reference HTTP-based ingestion and query layers. These layers may be replaced by code that works better with your enterprise.

Custom Ingestion

Several things must be done to properly ingest data:

Full resolution data must be written via AstyanaxWriter.insertFull().
A ScheduleContext object must be update()d regarding that metrics shard and collection time.
Shard state must be periodically pushed to the database for each shard that metrics have been collected for. This can be done by getting the dirty slot information from the ShardStateManager associated with a particular ScheduleContext object.

HttpMetricsIngestionServer is an example of how to set up a multi-threaded staged ingestion pipeline.

Custom Querying

Thankfully, querying is easier than ingestion. Whatever query service you create should have a handler that extends RollupHandler, which provides a basic wrapping of low level read operations provided by AstyanaxReader.

Operations

Blueflood exposes a great deal of internal performance metrics over JMX. Blueflood respects the standard JMX JVM settings:

com.sun.management.jmxremote.authenticate
com.sun.management.jmxremote.ssl
java.rmi.server.hostname
com.sun.management.jmxremote.port

You can use any tool that supports JMX to get internal performance metrics out of Blueflood.

Additionally, internal performance metrics can be pushed directly to a Graphite service by specifying the following in your Blueflood configuration:

GRAPHITE_HOST
GRAPHITE_PORT
GRAPHITE_PREFIX

Contributing

First, we welcome bug reports and contributions. If you would like to contribute code, just fork this project and send us a pull request. If you would like to contribute documentation, you should get familiar with our wiki

Also, we have set up a Google Group to answer questions.

If you prefer IRC, most of the Blueflood developers are in #blueflood on Freenode.

If you prefer hipchat, here is the link: https://www.hipchat.com/gQPx7fG8u

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

GeorgeJahad/blueflood