/cortx-hare

Hare is responsible for monitoring the distributed health of CORTX and maintaining consensus.

Primary LanguagePythonApache License 2.0Apache-2.0

Codacy Badge License Slack YouTube GitHub contributors

Hare User Guide

What Hare does?

  1. Configures Motr object store.
  2. Starts/stops Motr services.
  3. Notifies Motr of service and device faults.

Hare implementation uses Consul key-value store and health-checking mechanisms.

Installation

Building from source

  • Download hare.

    git clone https://github.com/Seagate/cortx-hare.git hare
    cd hare
  • Install Python (≥ 3.6), libraries and header files needed to compile Python extensions.

    sudo yum -y install python3 python3-devel
  • Install puppet-agent (≥ 6.13.0)

    Check facter version details, if already installed / exist in system

    facter -v

    Supported version is facter >= 3.14.8, If the facter version is < 3.14.8 then follow the steps below:

    yum erase -y $(rpm -q --whatprovides $(readlink -f /usr/bin/facter)) || rm -fv /usr/bin/facter

    Now install puppet-agent

    sudo yum localinstall -y https://yum.puppetlabs.com/puppet/el/7/x86_64/puppet-agent-7.0.0-1.el7.x86_64.rpm

    Create symlink to facter binary, if not

    [ ! -f /usr/bin/facter ] && sudo ln -s /opt/puppetlabs/bin/facter /usr/bin/facter
  • Install Consul.

    sudo yum -y install yum-utils
    sudo yum-config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo
    sudo yum -y install consul-1.9.1
  • Install py-utils.

    Please refer to the instruction to install corxt-py-utils from sources.

  • Build and Install Motr.

    • Follow Motr quick start guide to build Motr from source. After compiling Motr sources, please continue with the below steps to build Hare using Motr sources.

      sudo scripts/install-motr-service --link
      
      export M0_SRC_DIR=$PWD
      cd -
  • Build and install Hare.

    make
    sudo make install
  • Create hare group.

    sudo groupadd --force hare
  • Add current user to hare group.

    sudo usermod --append --groups hare $USER

    Log out and log back in.

Build and install rpms from source

NOTE: If you have built Motr and HARE from sources you need not generate RPM packages as below, however, it might be more convenient to build and install rpms on a multinode setup sometimes

  • Build & Install Motr RPMs

  • Build hare RPMs

    • Download hare source as mentioned above
      cd hare
      make rpm
      sudo rpm -ivh ~/rpmbuild/RPMS/x86_64/cortx-hare-*.rpm

Quick start

☑️ Checklist

Before starting the cluster as <user> at <origin> machine, ensure that

# Check Where
1 passwordless sudo works for <user> all machines
2 <user> can ssh from <origin> to other machines <origin>
3 cortx-hare and cortx-s3server RPMs are installed all machines
4 /opt/seagate/cortx/hare/bin is in <user>'s PATH all machines
5 <user> is a member of hare group all machines
6 CDF exists and corresponds to the actual cluster configuration <origin>

Prepare the CDF

If you are starting the cluster for the first time, you will need a cluster description file (CDF).

See cfgen --help-schema for the description of CDF format.

You can make a copy of /opt/seagate/cortx/hare/share/cfgen/examples/singlenode.yaml (single-node setup) or /opt/seagate/cortx/hare/share/cfgen/examples/ldr1-cluster.yaml (dual-node setup) and edit it as necessary.

cp /opt/seagate/cortx/hare/share/cfgen/examples/singlenode.yaml ~/CDF.yaml
vi ~/CDF.yaml

You will probably need to modify host, data_iface, and io_disks values.

data_iface

  • Make sure that data_iface value refers to existing network interface (it should be present in the output of ip a command).

  • This network interface must be configured for LNet. If you can see its IP address in the output of sudo lctl list_nids command, you are all set. Otherwise, configure LNet by executing this code snippet on each node:

    IFACE=eth1  # XXX `data_iface` value from the CDF
    sudo tee /etc/modprobe.d/lnet.conf <<< \
        "options lnet networks=tcp($IFACE) config_on_load=1"

io_disks

  • Devices specified in io_disks section must exist.

  • Sometimes it is convenient to use loop devices instead of actual disks:

    sudo mkdir -p /var/motr
    for i in {0..9}; do
        sudo dd if=/dev/zero of=/var/motr/disk$i.img bs=1M seek=9999 count=1
        sudo losetup /dev/loop$i /var/motr/disk$i.img
    done

Hare we go

  • Start the cluster.

    hctl bootstrap --mkfs ~/CDF.yaml
  • Run I/O test.

    /opt/seagate/cortx/hare/libexec/m0crate-io-conf >/tmp/m0crate-io.yaml
    dd if=/dev/urandom of=/tmp/128M bs=1M count=128
    sudo m0crate -S /tmp/m0crate-io.yaml

    Please note that m0crate will run as shown above when it will be available in default system PATH which will be the case when setup is created using RPMs. If its created by building Motr source code, then m0crate utility can be run using full path from the motr source directory (say MOTR_SRC). ./MOTR_SRC/motr/m0crate/m0crate

  • Stop the cluster.

    hctl shutdown

Reporting problems

To request changes or report a bug, please log an issue and describe the problem you are facing.

When reporting a bug, consider running

hctl reportbug

to collect forensic data. Run this command on every node of the cluster and attach generated files to the GitHub issue.

Troubleshooting

LNet is not configured

  • To check, run

    sudo lctl list_nids

    This command should show network identifiers.

  • If it doesn't, try to start LNet manually:

    sudo modprobe lnet
    sudo lctl network up

    Run sudo lctl list_nids again.

  • Still no luck? Perhaps /etc/modprobe.d/lnet.conf file is missing or corrupted. Create it with these commands:

    IFACE=eth1  # XXX `data_iface` value from the CDF
    sudo tee /etc/modprobe.d/lnet.conf <<< \
        "options lnet networks=tcp($IFACE) config_on_load=1"

    Try to start LNet one more time.

RC Leader cannot be elected

If hctl bootstrap cannot complete and keeps printing dots..........

2020-01-14 10:57:25: Generating cluster configuration... Ok.
2020-01-14 10:57:26: Starting Consul server agent on this node.......... Ok.
2020-01-14 10:57:34: Importing configuration into the KV Store... Ok.
2020-01-14 10:57:35: Starting Consul agents on remaining cluster nodes... Ok.
2020-01-14 10:57:35: Update Consul agents configs from the KV Store... Ok.
2020-01-14 10:57:36: Install Motr configuration files... Ok.
2020-01-14 10:57:36: Waiting for the RC Leader to get elected..................[goes on forever]

try these commands

hctl shutdown
sudo systemctl reset-failed hare-hax

and bootstrap again.

make install fails because of mypy issues

Symptoms:

  1. make or make install or make devinstall command fails
  2. The command output contains the output like this (perhaps this is not the latest lines in the output):
    19:58:19  running mypy
    19:58:20  hare_mp/store.py:21: error: Cannot find implementation or library stub for module named 'cortx.utils.conf_store'
    19:58:20  Success: no issues found in 1 source file
    19:58:20  make[4]: Leaving directory `/root/rpmbuild/BUILD/cortx-hare/cfgen'
    19:58:21  hare_mp/main.py:34: error: Cannot find implementation or library stub for module named 'cortx.utils.product_features'
    19:58:21  hare_mp/main.py:34: note: See https://mypy.readthedocs.io/en/latest/running_mypy.html#missing-imports
    19:58:21  Found 2 errors in 2 files (checked 8 source files)
    19:58:21  make[4]: *** [mypy] Error 1
    

Solution: install cortx-py-utils RPM and retry.

See also