Chef is a powerful tool for maintaining and describing the software and configurations that let a machine provide its services.
cluster_chef is
- a clean, expressive way to describe how machines and roles are assembled into a working cluster.
- Our collection of Industrial-strength, cloud-ready recipes for Hadoop, Cassandra, HBase, Elasticsearch and more.
- a set of conventions and helpers that make provisioning cloud machines easier.
Here’s a basic 3-node hadoop cluster:
ClusterChef.cluster 'demohadoop' do merge!('defaults') facet 'master' do instances 1 role "nfs_server" role "hadoop_master" role "hadoop_worker" role "hadoop_initial_bootstrap" end facet 'worker' do instances 2 role "nfs_client" role "hadoop_worker" server 0 do chef_node_name 'demohadoop_worker_zero' end end cloud :ec2 do image_name "lucid" flavor "c1.medium" availability_zones ['us-east-1d'] security_group :logmuncher do authorize_group "webnode" end end end
This defines a cluster (group of machines that serve some common purpose) with two facets, or unique configurations of machines within the cluster. (For another example, a webserver farm might have a loadbalancer facet, a database facet, and a webnode facet).
In the example above, the master serves out a home directory over NFS, and runs the processes that distribute jobs to hadoop workers. In this small cluster, the master also has workers itself, and a utility role that helps initialize it out of the box.
There are 2 workers; they use the home directory served out by the master, and run the hadoop worker processes.
Lastly, we define what machines to use for this cluster. Instead of having to look up and type in an image ID, we just say we want the Ubuntu ‘Lucid’ distribution on a c1.medium machine. Cluster_chef understands that this means we need the 32-bit image in the us-east-1 region, and makes the cloud instance request accordingly. It also creates a ‘logmunchers’ security group, opening it so all the ‘webnode’ machines can push their server logs onto the HDFS.
The following commands launch each machine, and once ready, ssh in to install chef and converge all its software.
knife cluster launch demohadoop master --bootstrap knife cluster launch demohadoop worker --bootstrap
You can also now launch the entire cluster at once with the following
knife cluster launch demohadoop --bootstrap
The cluster launch operation is idempotent: nodes that are running won’t be started!
Some general principles of how we use chef.
- Chef server is never the repository of truth — it only mirrors the truth.
– a file is tangible and immediate to access - Specifically, we want truth to live in the git repo, and be enforced by the chef server. There is no truth but git, and chef is its messenger.
– this means that everything is versioned, documented and exchangeable. - Systems, services and significant modifications cluster should be obvious from the `clusters` file. I don’t want to have to bounce around nine different files to find out which thing installed a redis:server.
– basically, the existence of anything that opens a port should be obvious when I look at the cluster file. - Roles define systems, clusters assemble systems into a machine.
– For example, a resque worker queue has a redis, a webserver and some config files — your cluster should invoke awhatever_queue
role, and thewhatever_queue
role should include recipes for the component services.
– the existence of anything that opens a port or runs as a service should be obvious when I look at the roles file. - include_recipe considered harmful Do NOT use include_recipe for anything that a) provides a service, b) launches a daemon or c) is interesting in any way. (so:
include_recipe java
yes;include_recipe iptables
no.) You should note the dependency in the metadata.rb. This seems weird, but the breaking behavior is purposeful: it makes you explicitly state all dependencies. - It’s nice when machines are in full control of their destiny.
– initial setup (elastic IP, attaching a drive) is often best enforced externally
– but machines should be ablt independently assert things like load balancer registration that that might change at any point in the lifetime. - It’s even nicer, though, to have full idempotency from the command line: I can at any time push truth from the git repo to the chef server and know that it will take hold.
This assumes you have installed chef, have a working chef server, and have an AWS account. If you can run knife and use the web browser to see your EC2 console, you can start here. If not, see the instructions below.
Install these gems (you may have several installed already):
sudo gem install chef fog highline configliere net-ssh-multi formatador terminal-table gorillib extlib
(and maybe right_aws and broham?)
This assumes the same setup as described in the knife quickstart walkthrough, but to keep things convenient we’ll use these shortcuts to refer
CHEF_REPO_DIR=$HOME/PATH/TO/chef-repo # has cookbooks/ and site-cookbooks/ DOT_CHEF_DIR=$HOME/PATH/TO/chef-repo/.chef # has your knife.rb, USERNAME.pem etc
I recommend adding cluster chef as a git submodule; you can instead clone it elsewhere.
cd $CHEF_REPO_DIR git submodule add git@github.com:infochimps/cluster_chef.git cluster_chef
In your $DOT_CHEF_DIR/knife.rb
, modify the cookbook path (to include cluster_chef/cookbooks and cluster_chef/site-cookbooks) and to add settings for cluster_chef_path
, cluster_path
and keypair_path
. Here’s mine:
current_dir = File.dirname(__FILE__) organization = 'CHEF_ORGANIZATION' username = 'CHEF_USERNAME' # The full path to your cluster_chef installation cluster_chef_path File.expand_path("#{current_dir}/../cluster_chef") # The list of paths holding clusters cluster_path [ File.expand_path("#{current_dir}/../clusters") ] # The directory holding your cloud keypairs keypair_path File.expand_path(current_dir) log_level :info log_location STDOUT node_name username client_key "#{keypair_path}/#{username}.pem" validation_client_name "#{organization}-validator" validation_key "#{keypair_path}/#{organization}-validator.pem" chef_server_url "https://api.opscode.com/organizations/#{organization}" cache_type 'BasicFile' cache_options( :path => "#{ENV['HOME']}/.chef/checksums" ) # The first things have lowest priority (so, site-cookbooks gets to win) cookbook_path [ "#{cluster_chef_path}/cookbooks", "#{current_dir}/../cookbooks", "#{current_dir}/../site-cookbooks", ] # If you primarily use AWS cloud services: knife[:ssh_address_attribute] = 'cloud.public_hostname' knife[:ssh_user] = 'ubuntu' # Configure bootstrapping knife[:bootstrap_runs_chef_client] = true bootstrap_chef_version "~> 0.10.0" # AWS access credentials knife[:aws_access_key_id] = "XXXXXXXXXXX" knife[:aws_secret_access_key] = "XXXXXXXXXXXXX"
To send all the cookbooks and role to the chef server, visit your cluster_chef directory and run:
cd $CHEF_REPO_DIR mkdir -p $CHEF_REPO_DIR/site-cookbooks knife cookbook upload --all for foo in roles/*.rb ; do knife role from file $foo & sleep 1 ; done
You should see all the cookbooks defined in cluster_chef/cookbooks (ant, apt, …) listed among those it uploads.
mkdir -p $DOT_CHEF_DIR/plugins ln -s $CHEF_REPO_DIR/cluster_chef/lib/cluster_chef/knife $DOT_CHEF_DIR/plugins/ ln -s $CHEF_REPO_DIR/cluster_chef/lib/cluster_chef/knife/bootstrap $DOT_CHEF_DIR/
If you type knife cluster launch
you should see it found the new scripts:
** CLUSTER COMMANDS ** knife cluster launch CLUSTER_NAME [FACET_NAME] (options) knife cluster show CLUSTER_NAME [FACET_NAME] (options)
Let’s create a cluster called ‘demosimple’. It’s, well, a simple demo cluster.
Create a directory for your clusters; copy the demosimple cluster and its associated roles from cluster_chef:
mkdir -p $CHEF_REPO_DIR/clusters cp cluster_chef/clusters/{defaults,demosimple}.rb ./clusters/ cp cluster_chef/roles/{big_package,nfs_*,ssh,base_role,chef_client}.rb ./roles/ for foo in roles/*.rb ; do knife role from file $foo ; done
Symlink in the cookbooks you’ll need, and upload them to your chef server:
cd $CHEF_REPO_DIR/cookbooks ln -s ../cluster_chef/site-cookbooks/{nfs,big_package,cluster_chef,cluster_service_discovery,firewall,motd} . knife cookbook upload nfs big_package cluster_chef cluster_service_discovery firewall motd
Make a cloud keypair, a secure key for communication with Amazon AWS cloud. cluster_chef expects a keypair named after its cluster — this is a best practice that helps keep your environments distinct.
- Log in to the AWS console and create a new keypair named
demosimple
. Your browser will download the private key file. - Move the private key file you just downloaded to your .chef dir, and make it private key unsnoopable, or ssh will complain:
mv ~/downloads/demosimple.pem $DOT_CHEF_DIR/demosimple.pem chmod 600 $DOT_CHEF_DIR/*.pem
Hooray! You’re ready to launch a cluster:
knife cluster launch demosimple homebase --bootstrap
It will kick off a node and then bootstrap it. You’ll see it install a whole bunch of things. Yay.
If you already have a working chef installation you can skip this section.
To get started with knife and chef, follow the Chef Quickstart, We use the hosted chef service and are very happy, but there are instructions on the wiki to set up a chef server too. Stop when you get to “Bootstrap the Ubuntu system” — cluster chef is going to make that much easier.
If you can use the normal knife bootstrap commands to launch a machine, you can skip this step.
Steps:
- sign up for an AWS account
- Follow the Knife with AWS quickstart on the opscode wiki.
Right now cluster chef works well with AWS. If you’re interested in modifying it to work with other cloud providers, see here or get in touch.
Suppose you are using the git
resource to deploy a recipe (george
for sake of example). If /var/chef/cache/revision_deploys/var/www/george
exists then nothing will get deployed, even if /var/www/george/{release_sha} is empty or screwy. If git deploy is acting up in any way, nuke that cache from orbit — it’s the only way to be sure.
Once the master runs to completion with all daemons started, remove the hadoop_initial_bootstrap recipe from its run_list. (Note that you may have to edit the runlist on the machine itself depending on how you bootstrapped the node).
For problems starting NFS server on ubuntu maverick systems, read, understand and then run /tmp/fix_nfs_on_maverick_amis.sh — See this thread for more
On older versions of chef that don’t have a plugin mechanism for new commands, we have to do surgery on the knife itself… we’ll just symlink the new commands into chef’s lib/chef/knife directory, and symlink the bootstrap templates into chef’s lib/chef/knife/bootstrap directory. Set the path to your cluster_chef directory and run the following:
sudo ln -s $CLUSTER_CHEF_DIR/lib/cluster_chef/knife/*.rb $(dirname `gem which chef`)/chef/knife/ sudo ln -s $CLUSTER_CHEF_DIR/lib/cluster_chef/knife/bootstrap/* $(dirname `gem which chef`)/chef/knife/bootstrap/