What?

A quick way to get a mcollective network built for testing or evaluating MCollective.

The network will consist of a single node that acts as a middleware server using Redis and a configurable amount of nodes under management. On a 32GB machine I have no problem running 26 machines using this repository.

This consists of a Vagrant file and a few very simple Puppet Modules that does the simplest possible thing to get a MCollective setup going.

This setup uses Redis for the middleware, discovery and registration thus providing a very light weight and fast setup. This is a setup optimized for a demo environment in production you're likely to use middleware like ActiveMQ or RabbitMQ.

This will setup the latest development MCollective along with the following plugins:

Package Agent
Service Agent
Puppet Agent
File Manager Agent
NRPE Agent
Process Agent
Net Test Agent
Request auditing enabled and logging to /var/log/mcollective-audit.log

Setup?

Assuming you have a working Vagrant setup on your system it should be real simple to get going:

$ git clone git://github.com/ripienaar/mcollective-vagrant.git
$ cd mcollective-vagrant

You should now edit the Vagrantfile and adjust the constants at the top to your tastes:

$ vi Vagrantfile
$ vagrant up

Using?

What follows is a whirlwind tour of MCollective where you can run these commands on this Vagrant setup to gain a sense of what it is about.

MCollective is a framework that you can use to solve your own orchestration problems with, it provides addressing, networking, Authentication, Authorization and Auditing for you leaving you to focus on just the problem you wish to solve.

The commands you'll see are all built ontop of the MCollective framework and it makes it easy for you to build your own similar commands for your needs. Puppet Labs provide a bunch of mature plugins for common needs which you will see here.

At the end you will see a hint on the underlying structure of the RPC system and how you can interact with it from scripts and other systems. Links to further reading is at the end of the tour.

Verifying it works

This is the most basic MCollective test, it broadcasts a message and shows you who replies, you should see one middleware machine and as many node machines as you set the INSTANCES variable to:

$ vagrant ssh middleware
$ mco ping
node0.example.net                        time=25.18 ms
middleware.example.net                   time=26.50 ms

---- ping statistics ----
2 replies max: 26.50 min: 25.18 avg: 25.84

Selecting machines to communicate with

MCollective uses your Configuration Management system for addressing, as these machines are built with Puppet you can use classes and facts for addressing.

Just the nodes with the roles::middleware class:

$ mco ping -W roles::middleware

...and the nodes with the roles::node class

$ mco ping -W roles::node

...and nodes that are middleware nodes with redis installed - matching based on regular expressions:

$ mco ping -W "roles::middleware /redis/"

There are a number of facts on each machine but as the nodes are identical there are not much variance so we added a fact called cluster that has some random data set:

$ mco facts cluster
Report for fact: cluster

        alfa                                    found 6 times
        bravo                                   found 5 times

Finished processing 11 / 11 hosts in 49.67 ms

You can now combine this fact with Puppet Classes to pick a subset of your nodes, this is an AND search:

$ mco ping -W "roles::node cluster=alfa"

...but you can do more complex things:

$ mco ping -S "(roles::node or roles::middleware) and cluster=alfa"

These arguments tend to apply to most MCollective commands so any of the following commands you'll see can be limited by the use of these discovery filters.

There are numerous other addressing methods - called discovery in MCollective - please read the MCollective CLI Usage documentation to see other ways.

Getting information about a node

MCollective has a lot of information about your nodes, you can ask it to show you what it knows with the inventory command:

$ mco inventory middleware.example.net
Inventory for middleware.example.net:

   Server Statistics:
                      Version: 2.3.1
                   Start Time: Mon Feb 25 13:02:28 +0100 2013
                  Config File: /etc/mcollective/server.cfg
                  Collectives: mcollective
              Main Collective: mcollective
                   Process ID: 3934
               Total Messages: 4
      Messages Passed Filters: 3
            Messages Filtered: 1
             Expired Messages: 0
                 Replies Sent: 2
         Total Processor Time: 1.46 seconds
                  System Time: 0.36 seconds

   Agents:
      discovery       filemgr         nettest
      nrpe            package         process
      puppet          rpcutil         service
      urltest

   Data Plugins:
      agent           fstat           nettest
      nrpe            package         process
      puppet          resource        service

   Configuration Management Classes:
      redis                          redis::config
      redis::install                 redis::service
      .
      .

   Facts:
      architecture => x86_64
      .
      .
      .

This is useful when debugging discovery issues or just to obtain information about a specific node.

You can also use this command to build quick reports of your infrastructure and remember the discovery filters you saw earlier can be used with this kind of report to produce ones for subsets of nodes:

$ mco inventory --script /etc/mcollective/inventory.mc

            Node Report Mon Feb 25 13:50:01 +0100

Hostname:         Distribution:
-------------------------------------------------------------------------

node0.example.net Operating System: CentOS
                           Release: 6.3
                      Architecture: x86_64
                        Processors: 1
.
.
.

See Node Reports for more information on creating your own reports.

Basic MCollective behaviour described

MCollective commands will try to only show you the most appropriate information. What this means is if you tried to restart a service using MCollective it will not show you every OK, it's only going to show you the cases where it could not complete your request:

$ mco service restart nrpe
Do you really want to operate on services unfiltered? (y/n): y

 * [ ============================================================> ] 11 / 11

   middleware.example.net: Could not restart service 'nrpe': Could not start Service[nrpe]: Execution of '/sbin/service nrpe start' returned 1:

Finished processing 11 / 11 hosts in 1016.29 ms

Here you can see it discovered 11 nodes, acted on 11 nodes but 1 of the 11 failed and it is only showing you the failure.

But when asking the status it assumes you actually want to see the information and so shows it all with a short overview at the bottom of the most important information to help you digest the information.

$ mco service status nrpe

 * [ ============================================================> ] 11 / 11

        node2.example.net: running
        node4.example.net: running
        node6.example.net: running
        node0.example.net: running
        node5.example.net: running
        node9.example.net: running
        node3.example.net: running
        node8.example.net: running
   middleware.example.net: stopped
        node7.example.net: running
        node1.example.net: running

Summary of Service Status:

   running = 10
   stopped = 1

Finished processing 11 / 11 hosts in 174.97 ms

The progress bar is usually shown, this shows you the progress as nodes complete the requested task, you can disable it using the --np or --no-progress arguments.

This is a key concept to understand in MCollective please see this blog post for rationale and background.

Managing Packages

$ mco package status mcollective

 * [ ============================================================> ] 2 / 2

        node0.example.net: mcollective-2.2.3-1.el6.noarch
   middleware.example.net: mcollective-2.2.3-1.el6.noarch

Summary of Arch:

   noarch = 2

Summary of Ensure:

   2.2.3-1.el6 = 2


Finished processing 2 / 2 hosts in 523.95 ms

You can also use this to install, update and upgrade packages on the systems see mco package --help for more information.

More information about the Package agent: GitHub

Managing Services

The package and service applications behave almost identical so I won't show full output but you can stop, start, restart and obtain the status of any service.

$ mco service status mcollective
.
.

See mco service --help for more information.

The package and service managers use the Puppet provider system to do their work so they support any OS Puppet does.

More information about the Service agent: GitHub

Testing network connectivity

You can easily test if machines are able to reach another host using the nettest agent:

$ mco nettest ping 192.168.2.10
Do you really want to perform network tests unfiltered? (y/n): y

 * [ ============================================================> ] 11 / 11

middleware.example.net                   time = 0.254
node0.example.net                        time = 0.41

Summary of RTT:

   Min: 0.254ms  Max: 1.316ms  Average: 0.577ms


Finished processing 2 / 2 hosts in 49.83 ms

Similarly you can also test if a TCP connection can be made:

$ mco nettest connect 192.168.2.10 8140

This command is best used with a discovery filter, imagine you suspect a machine in some VLAN is down, you can run ask other machines in that cluster to test it's availability

$ mco nettest ping 192.168.2.10 -W cluster=alfa --limit=20%

This will ask 20% of the machines in cluster=alfa to see if they can connect to the node in question.

More information about the nettest plugin: GitHub

Doing monitoring checks

This demo is setup with NRPE based monitoring and we integrated MCollective with NRPE. You can thus easily obtain real time monitoring results:

$ mco nrpe check_load

 * [ ============================================================> ] 11 / 11

Summary of Exit Code:

           OK : 11
      WARNING : 0
     CRITICAL : 0
      UNKNOWN : 0


Finished processing 11 / 11 hosts in 123.82 ms

Checks that are installed on this Vagrant setup are check_load, check_disks and check_swap.

More information about the nrpe plugin: GitHub

Network wide pgrep

You can quickly find out what nodes have processes matching some query much like the Unix pgrep command:

$ mco process list ruby

 * [ ============================================================> ] 2 / 2

   middleware.example.net

     PID      USER       VSZ            COMMAND
     2487     puppet     140.492 MB     /usr/bin/ruby /usr/bin/puppet master
     6451     root       160.797 MB     ruby /usr/sbin/mcollectived --pid=/var/run/mcollectived.pid

   node0.example.net

     PID      USER       VSZ            COMMAND
     6890     root       161.145 MB     ruby /usr/sbin/mcollectived --pid=/var/run/mcollectived.pid


Summary of The Process List:

           Matched hosts: 2
       Matched Processes: 3
           Resident Size: 39.132 MB
            Virtual Size: 462.434 MB

Finished processing 2 / 2 hosts in 82.51 ms

The fields shown are configurable - see the process agent

More information about the process plugin: GitHub

Testing website reachability

You can test the reachability of a website quite easily:

$ mco urltest http://www.devco.net/

 * [ ============================================================> ] 2 / 2

      Tester Location DNS      Connect    Pre-xfer   Start-xfer Total      Bytes Fetched
          middleware: 5.0057   0.0031     0.1872     0.1871     5.4010     101859
               node0: 5.0066   0.0043     0.1906     0.1905     5.4051     101859

Summary:

      DNS lookup time: min: 5.0057 max: 5.0066 avg: 5.0062 sdev: 0.0006
     TCP connect time: min: 0.0031 max: 0.0043 avg: 0.0037 sdev: 0.0009
   Time to first byte: min: 0.1872 max: 0.1906 avg: 0.1889 sdev: 0.0024
   HTTP Responce time: min: 0.1871 max: 0.1905 avg: 0.1888 sdev: 0.0024
     Total time taken: min: 5.4010 max: 5.4051 avg: 5.4030 sdev: 0.0029

Here you can see DNS lookup time is a problem, my Vagrant machines tend to have DNS issues.

Scripting and raw RPC

So far everything you have seen was purpose specific command line applications built to have familiar behaviours for their purpose. Every MCollective command though is simply performing RPC requests to the network which provides these RPC end points.

The last command can be run by interacting with the RPC layer directly:

$ mco rpc urltest perftest url=http://www.devco.net/

Here you can see you're using the rpc application to interact with the urltest agent calling out to its perftest action and supplying the url argument.

The output will be familiar but now you can see it's more showing you raw data for every node but still the basic behaviour and output format is familiar.

You can interact with any agent and to get a list of available agents run the mco plugin doc command:

$ mco plugin doc
Please specify a plugin. Available plugins are:

Agents:
  filemgr                   File Manager
  nettest                   Agent to do network tests from a mcollective host
  nrpe                      Agent to query NRPE commands via MCollective
  package                   Install and uninstall software packages
  process                   Agent To Manage Processes
  puppet                    Run Puppet agent, get its status, and enable/disable it
  rpcutil                   General helpful actions that expose stats and internals to SimpleRPC clients
  service                   Start and stop system services
  urltest                   Agent that connects to a URL and returns some statistics

And you can ask MCollective to show you available actions and arguments for each:

$ mco plugin doc agent/urltest

This will produce auto generated help for the agent showing the available actions etc.

And finally you can easily write a small script to perform the same url test action:

#!/usr/bin/ruby

require 'mcollective'

include MCollective::RPC

tester = rpcclient("urltest")

printrpc tester.perftest(:url => "http://www.devco.net/")

printrpcstats

If you put this in a script and ran it you should see familiar output.

Auditing

After you've run a bunch of commands from the list above take a look at the file /var/log/mcollective-audit.log which is an audit log of all actions taken on a machine, there's an example below:

2013-02-25T14:17:30.950082+0100: reqid=5d6f52b519ce5e91b18f46ac7a6d2633: reqtime=1361798250 caller=user=vagrant@middleware.example.net agent=urltest action=perftest data={:url=>"http://www.devco.net/", :process_results=>true}
2013-02-25T14:21:01.818629+0100: reqid=9adb441fedc55675855f749b01f67730: reqtime=1361798461 caller=user=vagrant@middleware.example.net agent=service action=restart data={:service=>"nrpe", :process_results=>true}
2013-02-25T16:23:47.398020+0100: reqid=fbf93dc4e1605aba9fb7d4e5a6e3b993: reqtime=1361805827 caller=user=vagrant@middleware.example.net agent=nettest action=ping data={:process_results=>true, :fqdn=>"192.168.2.10"}

You can provide your own Audit plugins, more information here

Modifying?

The Puppet modules used to install the boxen are in the deploy/modules directory. The mcollective module has files/lib and everything in there will just be recursively copied to the nodes. So if you want to test some plugin you're working on just copy it in there and run vagrant provision

The EPEL and Puppet Labs repositories are on the machines and all of the plugins mentioned in the first section of this document are installed from there. Some plugins though like the Redis ones aren't yet available at Puppet Labs so for now they are deployed from the module lib dir.

There is a package repo in deploy/packages with some dependencies and this repo is added to all the nodes, so if you drop a new package in there just run createrepo in that directory and it would be available to all the machines.

I'd love to see the various things like the Puppet setup done using proper modules from the forge so PRs would be appreciated

Contact?

R.I.Pienaar / rip@devco.net / @ripienaar / http://devco.net/

ploubser/mcollective-vagrant