QUADS automates the future scheduling, end-to-end provisioning and delivery of bare-metal servers and networks.
- Visit the QUADS blog
- Please read our contributing guide and use Gerrit Review to submit patches.
- QUADS (quick and dirty scheduler)
- What does it do?
- Design
- Requirements
- Setup Overview
- QUADS Workflow
- QUADS Switch and Host Setup
- Installing QUADS
- QUADS Usage Documentation
- QUADS Reporting
- Common Administration Tasks
- Creating a New Cloud Assignment and Schedule
- Managing Faulty Hosts
- Managing Retired Hosts
- Extending the Schedule of an Existing Cloud
- Extending the Schedule of a Specific Host
- Shrinking the Schedule of an Existing Cloud
- Shrinking the Schedule of a Specific Host
- Terminating a Schedule
- Adding Hosts to an existing Cloud
- Removing a Schedule
- Removing a Schedule across a large set of hosts
- Removing a Host from QUADS
- Modify a Host Interface
- Remove a Host Interface
- Using the QUADS JSON API
- Additional Tools and Commands
- Interacting with MongoDB
- Using JIRA with QUADS
- Backing up QUADS
- Restoring QUADS DB from Backup
- Troubleshooting Validation Failures
- Contact QUADS Developers
- QUADS Talks and Media
- Create and manage unlimited future scheduling for automated slicing & dicing of systems and network infrastructure
- Drive automated systems provisioning and network switch changes to deliver isolated, multi-tenant bare-metal environments
- Automated network and provisioning validation prior to delivering sets of machines/networks to tenants
- Automated allocation of optional, publicly routable VLANs
- Generates/maintains user-configurable instackenv.json to accomodate OpenStack deployment.
- Generates/maintains user-configurable ocpinventory.json for OpenShift on Baremetal Deployments
- Automatically generate/maintain documentation to illustrate current status, published to a Wordpress instance
- Current system details, infrastructure fleet inventory
- Current system group ownership (cloud), workloads and assignments
- Total duration and time remaining in system assignments
- Dynamic provisioning & system/network validation status per assignment
- Currently allocated/free optional publicly routable VLAN status
Granular Ansible facts inventory per server via ansible-cmdb(to be re-introduced in 1.1+)
- Query scheduling data to determine future availability
- Generates a per-month visualization map for per-machine allocations to assignments.
- RT (or similiar ticketing system) integration.
- IRC bot and email notifications for new provisioning tasks and ones ending completion
Control PDU sockets for connected bare-metal systems for power action(to be re-introduced in 1.2)
- Main components:
Python3, Cherrypy, Mongoengine, MongoDB, Jinja2
- Installation via Docker compose, RPM (Fedora or EL8+) or Github sources
- We use badfish for managing bare-metal IPMI
- We use Foreman for the systems provisioning backend.
- We use Wordpress for auto-generating wiki and documentation.
- A typical container-based QUADS deployment might look like this:
- In QUADS 1.1+ we are using Python3, Cherrypy and Jinja2 with MongoDB as the database backend.
- The scheduling functionality can be used standalone, but you'll want a provisioning backend like Foreman to take full advantage of QUADS scheduling, automation and provisioning capabilities.
- To utilize the automatic wiki/docs generation we use Wordpress but anything that accepts markdown via an API should work.
- Switch/VLAN automation is done on Juniper Switches in Q-in-Q VLANs, but command sets can easily be extended to support other network switch models.
- We use badfish for Dell systems to manage boot order to accomodate OpenStack deployments via Ironic/Triple-O as well as to control power actions via the Redfish API.
- Documentation for setting up and using QUADS is available in detail within this repository.
- Below is a high-level overview of a greenfield setup, some of this may exist already for you.
Step | Documentation | Details |
---|---|---|
General Architecture Overview | docs | Architecture overview |
Install and Setup Foreman/Satellite | docs | Not covered here |
Setup Foreman/Satellite Validation Templates | examples | Templates for internal interface configs |
Prepare Host and Network Environment | docs | Covers Juniper Environments, IPMI, Foreman |
Install QUADS | docs | RPM, Docker or Github Source |
Install MongoDB | docs | May not be available via your distribution due to licensing changes |
Install Wiki | docs | For RPM or Github Source only |
Configure your QUADS Move Command | docs | Configure your provisioning and move actions |
Configure QUADS Crons | docs | Tell QUADS how to manage your infrastructure |
Add Clouds and Hosts | docs | Configure your hosts and environments in QUADS |
Host Metadata Model and Search | docs | Host metadata info and filtering |
Using JIRA with QUADS | docs | Optional JIRA tools and library for QUADS |
You can read about QUADS architecture, provisioning, visuals and workflow in our documentation examples and screenshots
- To ensure you have setup your network switches and bare-metal hosts properly please follow our Switch and Host Setup Docs
- We offer Docker compose, RPM packages or a Git clone installation (for non RPM-based distributions, BSD UNIX, etc).
- It's recommended to use the Docker method as it requires less setup
- Clone the QUADS Github repository
git clone --single-branch --branch master https://github.com/redhat-performance/quads /opt/docker/quads
- Read through the QUADS YAML configuration file for other settings you way want.
- Make a copy of it and place it on the local filesystem of the Docker host outside the git checkout
mkdir -p /opt/quads/conf
cp /opt/docker/quads/conf/quads.yml /opt/quads/conf/quads.yml
- Make any changes required to your
/opt/quads/conf/quads.yml
vi /opt/quads/conf/quads.yml
- Run docker-compose to bring up a full QUADS stack
docker-compose -f /opt/docker/quads/docker/docker-compose-production.yml up -d
- Access Quads Wiki via browser at
http://localhost
orhttp://quads-container-host
to setup your Wiki environment. - Run commands against containerized quads via docker exec
docker exec quads bin/quads-cli --define-cloud cloud01 --description cloud01
- Container Layout
Container | Purpose | Source Image | Name |
---|---|---|---|
quads | quads server | Official Python3 Image | python:3 |
quads_db | quads database | Official Mongodb Image | mongo:4.0.4-xenial |
nginx | wiki proxy | Official Nginx Image | nginx:1.15.7-alpine |
wiki | quads wiki | Official WP Image | wordpress:5.2.2-php7.2-fpm-alpine |
wiki_db | wiki database | Official MariaDB Image | mariadb |
We find it useful to create an alias on your quads container for executing quads-cli commands inside the container.
- On your docker host:
echo 'alias quads="docker exec -it quads bin/quads-cli"' >> ~/.bashrc
- e.g. creating an environment and adding hosts
quads --define-cloud cloud01 --description "spare pool"
quads --add-host host01 --default-cloud cloud01 --host-type general
This method requires you to satisfy all of your Python3 and library dependencies yourself and isn't recommended, however it probably is the only way to run QUADS on some platforms like FreeBSD. Substitute package names and methods appropriately.
- Clone the git repository (substitute paths below as needed)
git clone https://github.com/redhat-performance/quads /opt/quads
- Install pre-requisite Python packages
dnf install python3-requests python3-wordpress-xmlrpc python3-pexpect python3-paramiko ipmitool python3-cherrypy python3-mongoengine mongodb mongodb-server python3-jinja2 python3-passlib python3-PyYAML python3-requests python3-GitPython
- Install a webserver (Apache, nginx, etc)
dnf install httpd
- Create logging directory (you can edit this in
conf/quads.yml
via thelog:
parameter).
mkdir -p /opt/quads/log
- Create your visualization web directory (you can configure this in
conf/quads.yml
viavisual_web_dir
)
mkdir -p /var/www/html/visual
- Populate the web visualization images in your webserver directory
cp -p /opt/quads/images/{button*,texture*}.png /var/www/html/visual/
- Read through the QUADS YAML configuration file for other settings you way want.
vi /opt/quads/conf/quads.yml
- Enable and start the QUADS systemd service (daemon)
- Note: You can change QUADS
quads_base_url
listening port inconf/quads.yml
and use the--port
option
cp /opt/quads/systemd/quads-server.service /etc/systemd/system/quads-server.service
systemctl daemon-reload
systemctl enable quads-server.service
systemctl start quads-server.service
- Note: You can use QUADS on non-systemd based Linux or UNIX distributions but you'll need to run
/opt/quads/bin/quads-server
via an alternative init process or similiar functionality.
- We build RPM packages for Fedora and CentOS/RHEL 8
- On Fedora 30 and above you'll need to manually install mongodb first, see installing mongodb for QUADS
- On Fedora 30 and above it is necessary to install
python3-wordpress-xmlrpc
as it is not included anymore
wget https://funcamp.net/w/python3-wordpress-xmlrpc-2.3-13.fc29.noarch.rpm
rpm -ivh --nodeps python3-wordpress-xmlrpc-2.3-13.fc29.noarch.rpm
This package is also available via pip
via pip install python-wordpress-xmlrpc
- On RHEL/CentOS 8 you'll need to install MongoDB first via
dnf install mongodb mongodb-server
- On RHEL/CentOS 8 you'll also need to satisfy
python3-paramiko
RPM package from somewhere as it's been removed from EL8 in lieu oflibssh
- Once you have mongodb installed and running you can install/upgrade QUADS via RPM.
dnf copr enable quadsdev/python3-quads -y
dnf install quads -y
- Note: If you want the latest development RPM based on the
master
branch instead:
dnf install quads-dev -y
- Read through the QUADS YAML configuration file
vi /opt/quads/conf/quads.yml
- Enable and Start dependent services
- haveged is a replacement entropy service for VM's, it's optional so turn it off if you want to use
/dev/random
- this solves certain performance issues known to occur with lack of entropy when running QUADS in a VM.
systemctl enable httpd
systemctl enable haveged
systemctl start haveged
systemctl start httpd
systemctl start mongod
- Enable and start the QUADS systemd service (daemon)
- Note: You can change QUADS
quads_base_url
listening port inconf/quads.yml
and use the--port
option
systemctl enable quads-server
systemctl start quads-server
- Source quads binaries in your $PATH (or login with another shell)
source /etc/profile.d/quads.sh
- Now you're ready to go.
quads-cli --help
-
For full functionality with Foreman you'll also need to have hammer cli installed and setup on your QUADS host.
-
Note: RPM installations will have
quads-cli
and tools in your system $PATH but you will need to login to a new shell to pick it up. We typically place this as an alias in/root/.bashrc
.
echo 'alias quads="quads-cli"' >> /root/.bashrc
- There is also a Wordpress Wiki VM QUADS component that we use a place to automate documentation via a Markdown to Python RPC API but any Markdown-friendly documentation platform could suffice. Note that the container deployment sets this up for you.
- You'll then simply need to create an
infrastructure
page andassignments
page and denote theirpage id
for use in automation. This is set inconf/quads.yml
- We also provide the
krusze
theme which does a great job of rendering Markdown-based tables, and theJP Markdown
plugin which is required to upload Markdown to the Wordpress XMLRPC Python API. TheClassic Editor
plugin is also useful. All themes and plugins can be activated from settings.
- It's advised to set the following parameter in your
wp-config.php
file to limit the amount of page revisions that are kept in the database.- Before the first reference to
ABSPATH
inwp-config.php
add:
- Before the first reference to
define('WP_POST_REVISIONS', 100);
- You can always clear out your old page revisions via the
wp-cli
utility as well, QUADS regenerates all content as it changes so there is no need to keep around old revisions of pages unless you want to.
yum install wp-cli -y
su - wordpress -s /bin/bash
wp post delete --force $(wp post list --post_type='revision' --format=ids)
-
QUADS relies on calling an external script, trigger or workflow to enact the actual provisioning of machines. You can look at and modify our move-and-rebuild-hosts tool to suit your environment for this purpose. Read more about this in the move-host-command section below.
-
Note: RPM installations will have
quads-cli
and tools in your system $PATH but you will need to login to a new shell to pick it up.
-
QUADS is a passive service and does not do anything you do not tell it to do. We control QUADS with cron, please copy and modify our example cron commands to your liking, adjust as needed.
-
Below are the major components run out of cron that makes everything work.
Service Command | Category | Purpose |
---|---|---|
quads-cli --move-hosts | provisioning | checks for hosts to move/reclaim as scheduled |
validate_env.py | validation | checks clouds pending to be released for all enabled validation checks |
regenerate_wiki.py | documentation | keeps your infra wiki updated based on current state of environment |
simple_table_web.py | visualization | keeps your systems availability and usage visualization up to date |
make_instackenv_json.py | openstack | keeps optional openstack triple-o installation files up-to-date |
- Define the various cloud environments
- These are the isolated environments QUADS will use and provision into for you.
quads-cli --define-cloud cloud01 --description "Primary Cloud Environment"
quads-cli --define-cloud cloud02 --description "02 Cloud Environment"
quads-cli --define-cloud cloud03 --description "03 Cloud Environment"
- Define the hosts in the environment (Foreman Example)
- Note the
--host-type
parameter, this is a mandatory, free-form label that can be anything. It will be used later forpost-config
automation and categorization. - If you don't want systems to be reprovisioned when they move into a cloud environment append
--no-wipe
to the define command. - We are excluding anything starting with mgmt- and including servers with the name r630.
- Note the
for h in $(hammer host list --per-page 1000 | egrep -v "mgmt|c08-h30"| grep r630 | awk '{ print $3 }') ; do quads-cli --define-host $h --default-cloud cloud01 --host-type general; done
- The command without Foreman would be simply:
quads-cli --define-host <hostname> --default-cloud cloud01 --host-type general
- Define the host interfaces, these are the internal interfaces you want QUADS to manage for VLAN automation
- Do this for every interface you want QUADS to manage per host (we are working on auto-discovery of this step).
- The variable
default_pxe_interface
on the quads.yml will set the default value ofpxe_boot=True
for that interface while any other interface will have a default value ofFalse
unless specified via--pxe-boot
or--no-pxe-boot
. This can be later modified via--mod-interface
.
quads-cli --add-interface em1 --interface-mac 52:54:00:d9:5d:df --interface-switch-ip 10.12.22.201 --interface-port xe-0/0/1:0 --interface-vendor "Intel" --interface-speed 1000 --host <hostname>
quads-cli --add-interface em2 --interface-mac 52:54:00:d9:5d:dg --interface-switch-ip 10.12.22.201 --interface-port xe-0/0/1:1 --interface-vendor "Intel" --interface-speed 1000 --pxe-boot --host <hostname>
quads-cli --add-interface em3 --interface-mac 52:54:00:d9:5d:dh --interface-switch-ip 10.12.22.201 --interface-port xe-0/0/1:2 --interface-vendor "Intel" --interface-speed 1000 --host <hostname>
quads-cli --add-interface em4 --interface-mac 52:54:00:d9:5d:d1 --interface-switch-ip 10.12.22.201 --interface-port xe-0/0/1:3 --interface-vendor "Intel" --interface-speed 1000 --host <hostname>
- To list the hosts:
quads-cli --ls-hosts
You will now see the list of full hosts.
c08-h21-r630.example.com
c08-h22-r630.example.com
c08-h23-r630.example.com
c08-h24-r630.example.com
c08-h25-r630.example.com
c08-h26-r630.example.com
c08-h27-r630.example.com
c08-h28-r630.example.com
c08-h29-r630.example.com
c09-h01-r630.example.com
c09-h02-r630.example.com
c09-h03-r630.example.com
- To list a hosts interface and switch information:
quads --ls-interface --host c08-h21-r630.example.com
{"name": "em1", "mac_address": "52:54:00:d9:5d:df", "switch_ip": "10.12.22.201", "switch_port": "xe-0/0/1:0"}
{"name": "em2", "mac_address": "52:54:00:d9:5d:dg", "switch_ip": "10.12.22.201", "switch_port": "xe-0/0/1:1"}
{"name": "em3", "mac_address": "52:54:00:d9:5d:dh", "switch_ip": "10.12.22.201", "switch_port": "xe-0/0/1:2"}
{"name": "em4", "mac_address": "52:54:00:d9:5d:d1", "switch_ip": "10.12.22.201", "switch_port": "xe-0/0/1:3"}
- To see the current system allocations:
quads-cli --summary
cloud01 : 45 (Primary Cloud Environment)
cloud02 : 0 (02 Cloud Environment)
cloud03 : 0 (03 Cloud Environment)
- For a more detailed summary of current system allocations use
--detail
quads-cli --summary --detail
cloud01 (quads): 45 (Primary Cloud Environment) - 451
cloud02 (jdoe): 0 (02 Cloud Environment) - 462
cloud03 (jhoffa): 0 (03 Cloud Environment) - 367
NOTE:
The format here is based on the following:
{cloud_name} ({owner}): {count} ({description}) - {ticket_number}
- Define a custom schedule for a host
- Example: assign host
c08-h21
to the workload/cloudcloud02
- Example: assign host
quads-cli --add-schedule --host c08-h21-r630.example.com --schedule-start "2016-07-11 08:00" --schedule-end "2016-07-12 08:00" --schedule-cloud cloud02
- List the schedule for a specific host:
quads-cli --ls-schedule --host c08-h21-r630.example.com
You'll see the schedule output below
Default cloud: cloud01
Current cloud: cloud02
Defined schedules:
0:
start: 2016-07-11 08:00
end: 2016-07-12 08:00
cloud: cloud02
- Move any hosts that need to be re-allocated based on the current schedule
quads-cli --move-hosts
You should see the following verbosity from a move operation
INFO: Moving c08-h21-r630.example.com from cloud01 to cloud02 c08-h21-r630.example.com cloud01 cloud02
In QUADS, a move-command
is the actionable call that provisions and moves a set of systems from one cloud environment to the other. Via cron, QUADS routinely queries the existing schedules and when it comes time for a set of systems to move to a new environment or be reclaimed and moved back to the spare pool it will run the appropriate varation of your move-command
.
In the above example the default move command called /bin/echo
for illustration purposes. In order for this to do something more meaningful you should invoke a script with the --move-command
option, which should be the path to a valid command or provisioning script/workflow.
- Define your move command by pointing QUADS to an external command, trigger or script.
- This expects three arguments
hostname current-cloud new-cloud
. - Runs against all hosts according to the QUADS schedule.
quads-cli --move-hosts --move-command quads/tools/move_and_rebuild_hosts.py
-
You can modify the default settings via the
default_move_command
setting in quads-cli. -
You can look at the move-and-rebuild-hosts script as an example. It's useful to note that with
quads/tools/ move_and_rebuild_hosts.py
passing a fourth argument will result in only the network automation running and the actual host provisioning will be skipped. You should review this script and adapt it to your needs, we try to make variables for everything but some assumptions are made to fit our running environments.
As of QUADS 1.1.6
we now have the --report-detailed
command which will list all upcoming future assignments that are scheduled.
You can also specify custom start and end dates via --schedule-start "YYYY-MM-DD HH:MM"
and --schedule-stop "YYYY-MM-DD HH:MM"
quads-cli --report-detailed
Example Output
Owner | Ticket| Cloud| Description| Systems| Scheduled| Duration|
tcruise | 1034| cloud20| Openshift| 6| 2022-02-06| 14|
cwalken | 1031| cloud19| Openstack| 6| 2022-02-06| 14|
bhicks | 1029| cloud18| Openstack-B| 4| 2022-02-06| 14|
nreeves | 1028| cloud11| Openshift-P| 2| 2022-02-06| 14|
gcarlin | 1026| cloud08| Ceph| 4| 2022-02-06| 14|
Generate a report with a list of server types with total count of systems and their current and future availability plus an average build time delta overall
quads-cli --report-available
Example output
Quads report for 2019-12-01 to 2019-12-31:
Percentage Utilized: 60%
Average build delta: 0:00:26.703556
Server Type | Total| Free| Scheduled| 2 weeks| 4 weeks
r620 | 5| 0| 100%| 0| 0
1029p | 3| 3| 0%| 3| 3
Additionally, you can pass --schedule-start
and --schedule-end
dates for reports in the past. 2 weeks and 4 weeks free calculate starting days from the first Sunday following when the command was run, or return current day at 22:01 if run on Sunday.
Generate a report detailing systems and scheduling utilization over the course of months or years.
quads-cli --report-scheduled --months 6
Example Output
Month | Scheduled| Systems| % Utilized|
2022-02 | 1| 1268| 42%|
2022-01 | 9| 1268| 66%|
2022-02 | 1| 1268| 42%|
2021-09 | 10| 1226| 83%|
2021-08 | 14| 1215| 77%|
2021-07 | 3| 1215| 87%|
Generate statistics on the number of assigned clouds in quads over a period of months in the past starting today or on a specific year.
quads-cli --report-scheduled --months 6
Example output
Month | Scheduled| Systems| % Utilized|
2019-12 | 0| 8| 58%|
2019-11 | 2| 8| 62%|
2019-10 | 15| 8| 20%|
2019-09 | 0| 0| 0%|
2019-08 | 0| 0| 0%|
Additionally, you can pass --year
instead for a report for every month in that year.
Creating a new schedule and assigning machines is currently done through the QUADS CLI. There are a few options you'll want to utilize. Mandatory options are in bold and optional are in italics.
- description (this will appear on the assignments dynamic wiki)
- cloud-owner (for associating ownership and usage notifications)
- force (needed for re-using an existing cloud)
- cc-users (Add additional people to the notifications)
- cloud-ticket (RT ticket used for the work, also appears in the assignments dynamic wiki)
- wipe (whether to reprovision machines going into this cloud, default is 1 or wipe.
This pertains to the internal interfaces that QUADS will manage for you to move sets of hosts between environments based on a schedule. For setting up optional publicly routable VLANS please see the QUADS public vlan setup steps
-
VLAN design (optional, will default to
qinq: 0
below) -
qinq: 0
(default) qinq VLAN separation by interface: primary, secondary and beyond QUADS-managed interfaces all match the same VLAN membership across other hosts in the same cloud allocation. Each interface per host is in its own VLAN, and these match across the rest of your allocated hosts by interface (all nic1, all nic2, all nic3, all nic4 etc). -
qinq: 1
all QUADS-managed interfaces in the same qinq VLAN. For this to take effect you need to pass the optional argument of--qinq 1
to the--define-cloud
command. -
You can use the command
quads-cli --ls-qinq
to view your current assignment VLAN configuration:
quads-cli --ls-qinq
cloud01: 0 (Isolated)
cloud02: 1 (Combined)
cloud03: 0 (Isolated)
cloud04: 1 (Combined)
If you need to associate a public vlan (routable) with your cloud, quads currently supports associating your last NIC per host with one of your defined public VLANs (see the QUADS public vlan setup steps).
To define your cloud with a public VLAN, use the following syntax:
quads-cli --define-cloud cloud03 [ other define-cloud options ] --vlan 601
If you need to clear the vlan association with your cloud, you can pass any string to the --vlan
argument in --mod-cloud
quads-cli --mod-cloud cloud03 --vlan none
quads-cli --define-cloud cloud03 --description "Messaging AMQ" --force --cloud-owner epresley --cc-users "jdoe jhoffa" --cloud-ticket 423625 --qinq 1
- Note: in QUADS
1.1.4
you can change any of these values selectively via the--mod-cloud
command described below.
- Now that you've defined your new cloud you'll want to allocate machines and a schedule.
- We're going to find the first 20 Dell r620's and assign them as an example.
quads-cli --cloud-only cloud01 | grep r620 | head -20 > /tmp/RT423624
- Now we'll allocate all of these hosts with a schedule, by default our system times use UTC.
for h in $(cat /tmp/RT423624) ; do quads-cli --host $h --add-schedule --schedule-start "2016-10-17 00:00" --schedule-end "2016-11-14 17:00" --schedule-cloud cloud03 ; done
- NOTE If you are using JIRA integration features with QUADS 1.1.5 and higher you can utilize
--host-list
along with a list of hosts and it will take care of updating your--cloud-ticket
in JIRA for you in one swoop.
quads-cli --add-schedule --host-list /tmp/hosts --schedule-start "2021-04-20 22:00" --schedule-end "2021-05-02 22:00" --schedule-cloud cloud20
That's it. At this point your hosts will be queued for provision and move operations, we check once a minute if there are any pending provisioning tasks. To check manually:
quads-cli --move-hosts --dry-run
After your hosts are provisioned and moved you should see them populate under the cloud list.
quads-cli --cloud-only cloud03
Starting with 1.1.4
QUADS can manage broken or faulty hosts for you and ensure they are ommitted from being added to a future schedule or listed as available. Prior to 1.1.4
this is managed via the Foreman host parameter broken_state
(true/false).
- Listing all broken systems.
# quads-cli --ls-broken
f18-h22-000-r620.stage.example.com
- Marking a system as faulty
# quads-cli --mark-broken --host f18-h23-000-r620.example.com
Host f18-h23-000-r620.example.com is now marked as broken
- Marking a system as repaired or no longer faulty.
# quads-cli --mark-repaired --host f18-h23-000-r620.example.com
Host f18-h23-000-r620.example.com is now marked as repaired.
- Hosts marked as faulty will be ommitted from
--ls-available
- Hosts marked as faulty are not able to be scheduled until they are marked as repaired again.
- If you previously used the
broken_state
Foreman host parameter to manage your broken or out-of-service systems within your fleet you'll want to migrate to using the new methodology of the QUADS database handling this for you for versions1.1.4
and higher. - You can use the following command to query Foreman and convert
broken_state
host parameters and status into QUADS:
for h in $(hammer host list --per-page 1000 --search params.broken_state=true | grep $(egrep ^domain /opt/quads/conf/quads.yml | awk '{ print $NF }') | awk '{ print $3 }') ; do quads-cli --mark-broken --host $h ; done
- With QUADS
1.1.5
and higher we now have the--retire
,--unretire
and--ls-retire
features to manage decomissioning or reviving hosts. - Hosts marked as retired will still retain their scheduling history and data, but will not show as available unless filtered.
- To list retired hosts:
quads-cli --ls-retire
- To retire a host:
quads-cli --retire --host host01.example.com
- To unretire a host:
quads-cli --unretire --host host01.example.com
- NOTE If upgrading from
1.1.4.1
or earlier QUADS you will need the following settings applied before using--retire
. - This can be referenced in the changelog/release notes as well.
$ cd /opt/quads
$ python
>>> from quads.model import Host
>>> hosts = Host.objects()
>>> for host in hosts:
... if not host.retired:
... host.update(retired=False)
Occasionally you'll want to extend the lifetime of a particular assignment. QUADS lets you do this with one command but you'll want to double-check things first. In this example we'll be extending the assignment end date for cloud02
In QUADS version 1.1.4
or higher or the current master
branch you can extend a cloud environment with a simple command.
quads-cli --extend --cloud cloud02 --weeks 2 --check
This will check whether or not the environment can be extended without conflicts.
To go ahead and extend it remove the --check
quads-cli --extend --cloud cloud02 --weeks 2
You might also want to extend the lifetime of a specific host. In this example we'll be extending the assignment end date for host01.
quads-cli --extend --host host01 --weeks 2 --check
This will check whether or not the environment can be extended without conflicts.
To go ahead and extend it remove the --check
quads-cli --extend --host host01 --weeks 2
Occasionally you'll want to shrink the lifetime of a particular assignment. In this example we'll be shrinking the assignment end date for cloud02
quads-cli --shrink --cloud cloud02 --weeks 2 --check
This will check whether or not the environment can be shrunk without conflicts.
To go ahead and shrink it remove the --check
quads-cli --shrink --cloud cloud02 --weeks 2
You might also want to shrink the lifetime of a specific host. In this example we'll be shrinking the assignment end date for host01.
quads-cli --shrink --host host01 --weeks 2 --check
This will check whether or not the host schedule can be shrunk without conflicts.
To go ahead and shrink it remove the --check
quads-cli --shrink --host host01 --weeks 2
If you would like to terminate the lifetime of a schedule at either a host or cloud level, you can pass the --now
argument instead of --weeks
which will set the schedules end date to now.
In this example we'll be terminating the assignment end date for cloud02.
quads-cli --shrink --cloud cloud02 --now --check
This will check whether or not the environment can be terminated without conflicts.
To go ahead and terminate it remove the --check
quads-cli --shrink --cloud cloud02 --now
QUADS also supports adding new machines into an existing workload (cloud).
- Search Availability Pool for Free Servers
- Let's look for any 5 x servers from
2019-03-11 22:00
until2019-04-22 22:00
- Let's look for any 5 x servers from
quads-cli --ls-available --schedule-start "2016-12-05 08:00" --schedule-end "2016-12-15 08:00"
c03-h11-r620.rdu.openstack.example.com
c03-h13-r620.rdu.openstack.example.com
c03-h14-r620.rdu.openstack.example.com
c03-h15-r620.rdu.openstack.example.com
- Move New Hosts into Existing Cloud
Above we see all the free servers during our timeframe, let's move them into cloud10
quads-cli --host c03-h11-r620.rdu.openstack.example.com --add-schedule --schedule-start "2016-12-05 08:00" --schedule-end "2016-12-15 08:00" --schedule-cloud cloud10
quads-cli --host c03-h13-r620.rdu.openstack.example.com --add-schedule --schedule-start "2016-12-05 08:00" --schedule-end "2016-12-15 08:00" --schedule-cloud cloud10
quads-cli --host c03-h14-r620.rdu.openstack.example.com --add-schedule --schedule-start "2016-12-05 08:00" --schedule-end "2016-12-15 08:00" --schedule-cloud cloud10
quads-cli --host c03-h15-r620.rdu.openstack.example.com --add-schedule --schedule-start "2016-12-05 08:00" --schedule-end "2016-12-15 08:00" --schedule-cloud cloud10
You can remove an existing schedule across a set of hosts using the --rm-schedule
flag against the schedule ID for each particular machine of that assignment.
- Example: removing the schedule for three machines in cloud
- Obtain the schedule ID via
quads-cli --ls-schedule --host
- These machines would happen to have the same cloud assignment as schedule id 2.
quads-cli --rm-schedule 2 --host c08-h01-r930.rdu.openstack.example.com
quads-cli --rm-schedule 2 --host c08-h01-r930.rdu.openstack.example.com
quads-cli --rm-schedule 2 --host c08-h01-r930.rdu.openstack.example.com
You should search for either the start or end dates to select the right schedule ID to remove when performing schedule removals across a large set of hosts.
- If you are using QUADS in any serious capacity always pick this option.
- Example: removing schedule by searching for start date.
- Often machine schedule ID's are different for the same schedule across a set of machines, this ensures you remove the right one.
for host in $(cat /tmp/452851); do quads-cli --rm-schedule $(quads-cli --ls-schedule --host $host | grep cloud08 | grep "start=2017-08-06" | tail -1 | awk -F\| '{ print $1 }') --host $host ; echo Done. ; done
To remove a host entirely from QUADS management you can use the --rm-host
command.
quads-cli --rm-host f03-h30-000-r720xd.rdu2.example.com
Removed: {'host': 'f03-h30-000-r720xd.rdu2.example.com'}
To remove a host entirely from QUADS management you can use the --rm-host
command.
quads-cli --mod-interface em1 --host f03-h30-000-r720xd.rdu2.example.com --no-pxe-boot
Interface successfully updated
To remove a host entirely from QUADS management you can use the --rm-host
command.
quads-cli --rm-interface em1 --host f03-h30-000-r720xd.rdu2.example.com
Resource properly removed
- All QUADS actions under the covers uses the JSON API v2
- This is an optional local systemd service you can start and interact with and listens on localhost
TCP/8080
- The tool
/opt/quads/quads/verify_switchconf.py
can be used to both validate and correct network switch configs. - This can be run at a cloud level (and with 1.1.5+ also at the per-host level).
- It's advised to run it first without
--change
to see if it would fix something. - This will also check/correct optional routable VLANs if those are in use.
- To validate a clouds network config:
/opt/quads/quads/tools/verify_switchconf.py --cloud cloud10
- To validate and fix a clouds network config use
--change
/opt/quads/quads/tools/verify_switchconf.py --cloud cloud10 --change
- To validate a singular hosts network switch configuration:
/opt/quads/quads/tools/verify_switchconf.py --host host01.example.com
- To validate and fix a single hosts network config use
--change
/opt/quads/quads/tools/verify_switchconf.py --host host01.example.com --change
- To straddle clouds and place a single host into a cloud it does not belong in (rare use case):
/opt/quads/quads/tools/verify_switchconf.py --host host01.example.com --cloud cloud10
Note, if host01.example.com is not in cloud10, but rather cloud20, you will see the following output:
WARNING - Both --cloud and --host have been specified.
WARNING -
WARNING - Host: host01.example.com
WARNING - Cloud: cloud10
WARNING -
WARNING - However, host01.example.com is a member of cloud20
WARNING -
WARNING - !!!!! Be certain this is what you want to do. !!!!!
WARNING -
- With the
modify_switch_conf.py
tool you can set up each individual network interface to a specific vlan id. - Passing the
--change
argument will make the changes effective in the switch. Not passing this will only verify the configuration is set to the desired.
/opt/quads/quads/tools/modify_switch_conf.py --host host01.example.com --nic1 1400 --nic2 1401 --nic3 1400 --nic4 1402 --nic5 1400
- All
--nic*
arguments are optional so this can be also done individually for all nics.
- An easy way to figure out what VLAN corresponds to what generic
em
interface in the QUADS--ls-interfaces
information we now include the following tool:
./opt/quads/quads/tools/ls_switch_conf.py --cloud cloud32
INFO - Cloud qinq: 1
INFO - Interface em1 appears to be a member of VLAN 1410
INFO - Interface em2 appears to be a member of VLAN 1410
This tool also accepts the --all
argument which will list a detail for all hosts in the cloud.
Additional you can achieve the same with the following shell one-liner, setting cloud=XX
for the cloud and adjusting $(seq 1 4)
for your interface ranges available on the host.
cloud=32 ; origin=1100 ; offset=$(expr $(expr $cloud - 1) \* 10); vl=$(expr $origin + $offset) ;for i in $(seq 1 4) ; do vlan=$(expr $vl + $i - 1) ; echo "em$i is interface VLAN $vlan in cloud$cloud" ; done
em1 is interface VLAN 1400 in cloud32
em2 is interface VLAN 1401 in cloud32
em3 is interface VLAN 1402 in cloud32
em4 is interface VLAN 1403 in cloud32
- You can then use this information to map specific interfaces into other VLAN/clouds as required for more one-off or ad-hoc requirements beyond the standard VLAN modes that QUADS currently supports.
- Note that this would be an example for the default
Q-in-Q 0 (isolated)
VLAN configuration. TheQ-in-Q 1 (combined)
configuration would simple beVLAN1400
for all interfaces above respectively.
- You can redefine or change any aspects of an already-defined cloud starting in
1.1.4
with the--mod-cloud
command. - This can be done a per-parameter or combined basis:
quads-cli --mod-cloud cloud02 --cloud-owner jhoffa
quads-cli --mod-cloud cloud04 --cc-users "tpetty fmercury"
quads-cli --mod-cloud cloud06 --vlan 604 --wipe
quads-cli --mod-cloud cloud50 --no-wipe
quads-cli --mod-cloud cloud50 --vlan none
-
Because QUADS knows about all future schedules you can display what your environment will look like at any point in time using the
--date
command. -
Looking into a specific environment by date
quads-cli --cloud-only cloud08 --date "2019-06-04 22:00"
f16-h01-000-1029u.rdu2.example.com
f16-h02-000-1029u.rdu2.example.com
f16-h03-000-1029u.rdu2.example.com
f16-h05-000-1029u.rdu2.example.com
f16-h06-000-1029u.rdu2.example.com
- Looking at all schedules by date
quads-cli --ls-schedule --date "2020-06-04 22:00"
- You can see what's in progress or set to provision via the
--dry-run
sub-flag of--move-hosts
quads-cli --move-hosts --dry-run
INFO: Moving b10-h27-r620.rdu.openstack.example.com from cloud01 to cloud03
INFO: Moving c02-h18-r620.rdu.openstack.example.com from cloud01 to cloud03
INFO: Moving c02-h19-r620.rdu.openstack.example.com from cloud01 to cloud03
INFO: Moving c02-h21-r620.rdu.openstack.example.com from cloud01 to cloud03
INFO: Moving c02-h25-r620.rdu.openstack.example.com from cloud01 to cloud03
INFO: Moving c02-h26-r620.rdu.openstack.example.com from cloud01 to cloud03
- You can use
quads-cli --find-free-cloud
to suggest a cloud environment to use that does not have any future hosts scheduled to use it.
quads-cli --find-free-cloud
cloud12
cloud16
cloud17
cloud18
-
The
--ls-available
functionality lets you search for available hosts in the future based on a date range or other criteria.- Find based on a date range:
quads-cli --ls-available --schedule-start "2019-12-05 08:00" --schedule-end "2019-12-15 08:00"
- Find based on starting now with an end range:
quads --ls-available --schedule-end "2019-06-02 22:00"
- In QUADS
1.1.4
and higher you can now filter your availability search based on hardware capabilities or model type. - Using this feature requires importing hardware metadata
- Example below using
--filter "model==1029U-TRTP"
quads-cli --ls-available --schedule-start "2020-08-02 22:00" --schedule-end "2020-08-16 22:00" --filter "model==1029U-TRTP"
- Listing retired hosts can now use the
--filter
feature:
quads-cli --ls-hosts --filter "retired==True"
- Listing specific hosts from a certain cloud:
quads-cli --cloud-only cloud13 --filter "model==FC640"
- We now have a Flask-based
--ls-available
web interface available onquadshost:5001
if your firewall rules are open forTCP/5001
. - Available in QUADS
1.1.4
or above as a tech preview (when we migrate fully to Flask this will be supplanted with a full UI). - This is provided via the
quads-web
systemd service or you can run it manually viacd /opt/quads/web ; python3 main.py
- You will need to seed the
models
data for your systems using the new host metadata feature - This is not available in containers as it's a tech preview but will be featured once our move from CherryPy to Flask is completed later.
- Control + click can select more than one model
- Not selecting a model assumes a search for anything available.
- You can utilize the new metadata model and
--filter
command in1.1.4
and above along with--ls-hosts
to search for a system by MAC Address.
quads-cli --ls-hosts --filter "interfaces.mac_address==ac:1f:6b:2d:19:48"
- You can list what systems are connected to a switch by querying the
ip_address
(soon to beswitch_ip
in 1.1.7) information from the interfaces datasource.
quads-cli --ls-hosts --filter "interfaces.ip_address==10.1.34.210"
-
In some scenarios you may wish to interrogate or modify values within MongoDB. You should be careful doing this and have good backups in place. Generally, we will try to implement data, object and document modification needs through quads-cli so you don't need to do this but sometimes it's useful for troubleshooting or other reasons.
- For more information see Interacting with MongoDB
-
We utilize the JIRA ticketing system internally for R&D infrastructure requests managed by QUADS.
-
We do provide some best-effort tooling and a JIRA library to bridge automation gaps.
- For more information see Using JIRA with QUADS
- We do not implement backups for QUADS for you, but it's really easy to do on your own via mongodump
- Refer to our docs on installing mongodb tools
- Implement
mongodump
to backup your database, we recommend using a git repository as it will take care of revisioning and updates for you. - Below is an example script we use for this purpose, this assumes you have a git repository already setup you can push to with ssh access.
#!/bin/bash
# script to call mongodump and dump quads db, push to git.
backup_database() {
mongodump --out /opt/quads/backups/
}
sync_git() {
cd /opt/quads/backups
git add quads/*
git add admin/*
git commit -m "$(date) content commit"
git push
}
backup_database
sync_git
-
If you have a valid mongodump directory structure you can restore the QUADS database via the following command.
-
This will drop the current database and replace it with your mongodump copy
- First, cd to the parent directory of where your mongorestore is kept
[root@host-04 rdu2-quads-backup-mongo]# ls
admin mongodump mongodump-quads.sh quads README.md
quads
is the directory containing our database dump files
- Use mongorestore to drop the current quads database and replace with your backup
mongorestore --drop -d quads quads
- You will see some messages and all should be good.
2019-05-05T01:23:01.257+0100 building a list of collections to restore from quads dir
2019-05-05T01:23:01.270+0100 reading metadata for quads.vlan from quads/vlan.metadata.json
2019-05-05T01:23:01.282+0100 reading metadata for quads.host from quads/host.metadata.json
2019-05-05T01:23:01.288+0100 reading metadata for quads.counters from quads/counters.metadata.json
2019-05-05T01:23:01.294+0100 reading metadata for quads.schedule from quads/schedule.metadata.json
2019-05-05T01:23:01.329+0100 restoring quads.vlan from quads/vlan.bson
2019-05-05T01:23:01.361+0100 restoring quads.host from quads/host.bson
2019-05-05T01:23:01.396+0100 restoring quads.counters from quads/counters.bson
2019-05-05T01:23:01.426+0100 restoring quads.schedule from quads/schedule.bson
2019-05-05T01:23:01.434+0100 restoring indexes for collection quads.vlan from metadata
2019-05-05T01:23:01.434+0100 restoring indexes for collection quads.host from metadata
2019-05-05T01:23:01.524+0100 finished restoring quads.host (494 documents)
2019-05-05T01:23:01.549+0100 finished restoring quads.vlan (148 documents)
2019-05-05T01:23:01.549+0100 reading metadata for quads.notification from quads/notification.metadata.json
2019-05-05T01:23:01.567+0100 reading metadata for quads.cloud_history from quads/cloud_history.metadata.json
2019-05-05T01:23:01.568+0100 no indexes to restore
2019-05-05T01:23:01.568+0100 finished restoring quads.counters (334 documents)
2019-05-05T01:23:01.602+0100 restoring quads.notification from quads/notification.bson
2019-05-05T01:23:01.643+0100 restoring quads.cloud_history from quads/cloud_history.bson
2019-05-05T01:23:01.659+0100 reading metadata for quads.cloud from quads/cloud.metadata.json
2019-05-05T01:23:01.661+0100 no indexes to restore
2019-05-05T01:23:01.661+0100 finished restoring quads.notification (41 documents)
2019-05-05T01:23:01.699+0100 restoring quads.cloud from quads/cloud.bson
2019-05-05T01:23:01.717+0100 restoring indexes for collection quads.cloud_history from metadata
2019-05-05T01:23:01.718+0100 no indexes to restore
2019-05-05T01:23:01.718+0100 finished restoring quads.schedule (433 documents)
2019-05-05T01:23:01.742+0100 restoring indexes for collection quads.cloud from metadata
2019-05-05T01:23:01.743+0100 finished restoring quads.cloud_history (94 documents)
2019-05-05T01:23:01.792+0100 finished restoring quads.cloud (32 documents)
2019-05-05T01:23:01.792+0100 done
A useful part of QUADS is the functionality for automated systems/network validation. Below you'll find some steps to help understand why systems/networks might not pass validation so you can address any issues.
There are two main validation tests that occur before a cloud environment is automatically released:
- Foreman Systems Validation ensures that no target systems in your environment are marked for build.
- VLAN Network Validation ensures that all the backend interfaces in your isolated VLANs are reachable via fping
All of these validations are run from /opt/quads/quads/tools/validate_env.py
and we also ship a few useful tools to help you figure out validation failures.
/opt/quads/quads/tools/validate_env.py
is run from cron, see our example cron entry
You should run through each of these steps manually to determine what systems/networks might need attention of automated validation does not pass in a reasonable timeframe. Typically, admin_cc:
will receieve email notifications of trouble hosts as well.
- General Availability can be checked via a simple
fping
command, this should be run first.
quads-cli --cloud-only cloud23 > /tmp/cloud23
fping -u -f /tmp/cloud23
- Foreman Systems Validation can be run via the hammer cli command provided by
gem install hammer_cli_foreman_admin hammer_cli
for host in $(quads-cli --cloud-only cloud15) ; do echo $host $(hammer host info --name $host | grep -i build); done
No systems should be left marked for build.
-
NOTE Automated validation will not start until 2 hours after the assignment is scheduled to go out, until this point
/opt/quads/quads/tools/validate_env.py
will not attempt to validate any systems if run and they have started less than 2 hours ago.- This can be set via the
validation_grace_period:
setting in/opt/quads/conf/quads.yml
- This can be set via the
-
/opt/quads/quads/tools/validate_env.py
now has a--debug
option which tells you what's happening during validation. -
This will test the backend network connectivity part and the entire set of checks.
-
Successful Validation looks like this:
/opt/quads/quads/tools/validate_env.py --debug
Validating cloud23
Using selector: EpollSelector
:Initializing Foreman object:
GET: /status
GET: /hosts?search=build=true
Command executed successfully: fping -u f12-h01-000-1029u.rdu2.scalelab.example.com f12-h02-000-1029u.rdu2.scalelab.example.com f12-h03-000-1029u.rdu2.scalelab.example.com
Command executed successfully: fping -u 172.16.38.126 172.20.38.126 172.16.36.206
Command executed successfully: fping -u 172.17.38.126 172.21.38.126 172.17.36.206
Command executed successfully: fping -u 172.18.38.126 172.22.38.126 172.18.36.206
Command executed successfully: fping -u 172.19.38.126 172.23.38.126 172.19.36.206
Subject: Validation check succeeded for cloud23
From: RDU2 Scale Lab <quads@example.com>
To: dev-null@example.com
Cc: wfoster@example.com, kambiz@example.com, jtaleric@example.com,
abond@example.com, grafuls@example.com, natashba@example.com
Reply-To: dev-null@example.com
User-Agent: Rufus Postman 1.0.99
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
MIME-Version: 1.0
A post allocation check previously failed for:
cloud: cloud23
owner: ipinto
ticket: 498569
has successfully passed the verification test(s)! The owner
should receive a notification that the environment is ready
for use.
DevOps Team
cloud23 / ipinto / 498569
- Unsuccessful Validation looks like this:
/opt/quads/quads/tools/validate_env.py --debug
Validating cloud23
Using selector: EpollSelector
:Initializing Foreman object:
GET: /status
GET: /hosts?search=build=true
There was something wrong with your request
ICMP Host Unreachable from 10.1.38.126 for ICMP Echo sent to f12-h14-000-1029u.rdu2.scalelab.example.com (10.1.38.43)
ICMP Host Unreachable from 10.1.38.126 for ICMP Echo sent to f12-h14-000-1029u.rdu2.scalelab.example.com (10.1.38.43)
ICMP Host Unreachable from 10.1.38.126 for ICMP Echo sent to f12-h14-000-1029u.rdu2.scalelab.example.com (10.1.38.43)
ICMP Host Unreachable from 10.1.38.126 for ICMP Echo sent to f12-h14-000-1029u.rdu2.scalelab.example.com (10.1.38.43)
- In
QUADS 1.1.6+
you can skip past network validation via:
/opt/quads/tools/validate_env.py --skip-network
- In older versions of QUADS you will want to consult the documentation for interacting with MongoDB for how to override this check.
- If you know your systems are built you can force
validate_env.py
to move into the network portions of the validation by toggling theprovisioned
attribute in MongoDB for your cloud object.
db.cloud.update({"name": "cloud23"}, {$set:{'provisioned':true}}
- More information on manual intervention and overrides via MongoDB can be found here
- If you want to validate only a certain cloud you can do so by specifying the cloud's name with the
--cloud
flag.
/opt/quads/tools/validate_env.py --cloud cloud01
You might have noticed that we configure our Foreman templates to drop 172.{16,17,18,19}.x
internal VLAN interfaces which correspond to the internal, QUADS-managed multi-tenant interfaces across a set of hosts in a cloud assignment.
The first two octets here can be substituted by the first two octets of your systems public network in order to determine from validate_env.py --debug
which host internal interfaces have issues or are unreachable.
- Above, we can run the
host
command to determine what these machines map to by substituting10.1
for the first two octects:
# for host in 10.1.37.231 10.1.38.150; do host $host; done
231.37.1.10.in-addr.arpa domain name pointer e17-h26-b04-fc640.example.com.
150.38.1.10.in-addr.arpa domain name pointer e17-h26-b03-fc640.example.com.
- Below you can see the code that maintains this mapping and assumptions:
This mapping feeds into our VLAN network validation code
Besides Github we're also on IRC via irc.libera.chat
. You can click here to join in your browser.