/nhmca

nhm's cluster analyzer tools

Primary LanguagePerlOtherNOASSERTION

INTRODUCTION

NHMCA Version 0.1 Alpha

INTRODUCTION

NHMCA is a perl program with a number of plugins that can create graphs and 
tables of information about cluster utilization.  Data can currently be read
in from moab, torque, and collectl.  Cluster parameters and defualt run
parameters are stored in an XML file and recorded data is stored in an sqlite
database.  Examples of the kinds of things that NHMCA can be used for:

- Graph many different aspects of cluster utilization by job or per-node 
  resource utlization using dynamically configurable line charts and heatmaps.
  Graphs can be created on a per job, per user, or per group basis.
- Create reports of jobs run over a specific date range and binned by 
  arbitrary clasffication filters.
- Monitor IO behavior on storage clusters (ie Lustre, Gluster, or Ceph).

PREREQUISITES

You will need several extra perl modules to use NHMCA:

XML::Simple
GD
DBI
DateTime
DateTime::Format

Depending on the database specified in the configuration file you may need to
install additional modules.  NHMCA is known to work with both sqlite3 and mysql
5.0.  Other SQL databases including postgresql may not work out of the box.

INSTALLATION:

Take one of the example settings files from the examples directory and modify 
it to suit your own cluster(s).  

The configuration file has multiple sections:

1) parameters - Defualt command-line options
2) cluster - Declaration of a cluster to monitor (repeatable)
3) stats - defintion of the statistics to generate for job usage reports.
4) graphs - grouping of declaration of graphs to generate.

Please note that clusters must currently have homogenous nodes.  Each group
of common nodes in a heterogenous cluster will need to be treated as a separate
cluster for the moment.  This will likely be changed in a later release.

The configuration file to use can be specified on the command line when nhmca 
is executed, but may also be copied to ~/.nhmca/settings.xml or 
/etc/nhmca/settings.xml and be read by default.

EXECUTION:

please run nhmca.pl --help to see a list of the command line options that are
available.

TODO:

- Fix various bugs and clean up code.
- Cleaup and standardize XML configuration file format.
- Write real documentation.
- Refactor chart drawing code.  Investigate various html5/js frontend options.
- Support heterogenous cluster configurations.
- Investigate faster ways to generate graphs and parse collectl data.
- Create a daemon and data collector for performance counter data using perf.
- look into faster/better schemes for storing vast amounts of per-process and
  performance counter data.
- associate job, collectl, and performance counter data for broad performance
  analysis studies for every user-spawned executable running on clusters.