hive-extras

Hive-related things that don't belong in the source repo and aren't SOPs.

Resource Monitoring

The monitoring subdirectory contains code that watches cloud resources, e.g. to make sure we're not leaving things around that cost us money unnecessarily.

AWS

The monitoring/aws subdirectory contains code that uses the AWS Lambda service to monitor our Hive Team Cluster. There is currently one setup: periodicHiveLambdaFunction monitors instance usage and emails a report daily at 6pm ET.

Installation

Prerequisites:

  • Ansible CLI, e.g. sudo yum install ansible
  • Python3
  • Some modules, maintained in requirements.txt. Install via python3 -m pip install --user -r monitoring/aws/requirements.txt
  • Authentication to the Hive team's AWS account, e.g. via a credentials file or environment variables.
    • Your AWS user must be a member of the lambda-admin group, or have equivalent permissions.

To install a playbook, you can simply execute its yaml file, e.g.:

./monitoring/aws/upload-lambda.yaml

Live Testing

  • Log into the Hive team's AWS console
  • Navigate to the Code tab for the lambda function you wish to test:
    • periodicHiveLambdaFunction (daily running instance report)
    • Create or load a Test configuration using the drop-down next to the orange "Test" button. The JSON payload might vary depending on the function you're running. However, at the time of this writing, both jobs take the same configuration. Here is a sample (configure the "recipients" list so the emails only come to you):
    {
      "regions": [
        "us-east-1",
        "us-east-2"
      ],
      "recipients": [
        "me@redhat.com"
      ],
      "fromemail": "openshift-hive-team@redhat.com",
      "emailregion": "us-east-1"
    }
  • Punch the orange "Test" button.

Debugging

It's not intuitive (at least to me) to find the various pieces of these jobs and schedules in the AWS console. Hopefully this helps. First, log into the Hive team's AWS console. Then these links should work:

Running Instances Notes
job code link This should match the respective python script in monitoring/aws/lambda/
test pane link See above for how to use this
schedule link Use the "Event schedule" tab to see the cron spec and upcoming runs
Use the "Targets" tab and click Constant under the Input column to see the current configuration
Note: We've had problems before when a second "Target" with no "Constant" configuration got mysteriously created. Deleting the schedule and rerunning the ansible uploader resolved it.
logs link