/cosmos-validators

Ansible Script for Cosmos-based blockchain validators

Primary LanguageJinja

Cosmos-based Node Ansible Setup Plus Several Supporting Playbooks

Design Philosophy

  1. Extendable to most Tendermint-based chains
  2. Support both mainnet and testnet
  3. Stable playbooks and roles; Customizable variables
  4. Support essential functions (snapshot, state-sync, public RPC/API endpoints and Cosmos Exporter) through separate playbooks

TL/DR

You run one playbook and set up a node.

ansible-playbook main.yml -e "target=juno_main"

Because we try our best to support the latest node version, it is not recommended for you to sync from Block 1. Rather, please state-sync or start from a snapshot.

Node deployment (Validator, Backup and Relayer)

For every network where we run a validator on mainnet, we run 3 nodes (Validator, Backup and Relayer). The details of our 3-node infrastructure are documented here.

Opinionated Configuration

We have 2 strong opinions about the node configuration:

  1. Each network will have its custom port prefix. This is to prevent port collision if you run multiple nodes on the same server (we do so for Backup Node and Relayer Node). For example, Juno's custom port prefix is 126 and that of Osmosis is 125. Since it is rather arbitrary, we are going to force the same convention on you unless you fork the code.
  2. Each type of node will have its setting based on Polkachu's "best practice". For example, the main node (Validator) has null indexer, and 100/0/ pruning, and Relayer node has kv indexer and 40000/2000/ pruning. We will force these setting on you unless you fork the code.

Host Variables

Take a look at the inventory.sample file. You will see an example juno group with 3 different hosts: juno_main, juno_backup, and juno_relayer. Each host will have the following variables:

  1. ansible_host: Required. The IP address of the server.
  2. type: Required. It can be main, backup and relayer (also test if you are adventurous). Each is opinionated in its configuration settings.
  3. prepare: Optional. If unset, it is default to true. If false, it will skip setups of firewall, go, cosmovisor, node exporter, promtail, etc. The reason for the false option is because we run many backup/relayer nodes on the same server with setup done already.

Other Variables

Besides the above host variables, you will also specify the following all variables in the inventory file:

  1. ansible_user: The sample file assumes ubuntu, but feel free to use other user name. This user need sudo privilege.
  2. ansible_port: The sample file assumes 22. But if you are like me, you will have a different ssh port other than 22 to avoid port sniffing.
  3. ansible_ssh_private_key_file: The sample file assumes ~/.ssh/id_rsa, but you might have a different key location.
  4. var_file: It tells the program where to look for the variable file. This is useless for the mainnet, because the var file will automatically be inferred by the network name. However, it is essentially for testnets.
  5. user_dir: The user's home directory. In the sample inventory file this is a computed variable based on the ansible_user. It assumes that it is not a root user and its home directory is /home/{{ansible_user}}.
  6. path: This is to make sure that the ansible_user can access the go executable.
  7. node_exporter: Default is true. Change it to false if you do not want to install node_exporter
  8. promtail: Default is true. Change it to false if you do not want to install promtail
  9. log_monitor: Enter your monitor server IP if you install promtail.
  10. node_name: This is your node name for the config.toml file.
  11. log_name: This is the server name for the promtail service.
  12. pagerduty_key: This is the PagerDuty key if you use TenderDuty

Ready? Go!

One you understand the setup, please first copy it to your own inventory file so you can customize it to suit your needs:

cp inventory.sample inventory

When you are ready install a node, you run:

ansible-playbook main.yml -e "target=HOST_NAME"

Playbooks

Playbook Description
main.yml The main playbook to set up a node
prepare.yml Prepare the server with node exporter, promtail, go, cosmovisor, and firewall rules
support_backup_node.yml Install snapshot, state_sync, resync, genesis and prune script on backup node
support_snapshot.yml Install snapshot script and a cron job
support_state_sync.yml Install state-sync script
support_resync.yml Install weekly scheduled state-sync and recovery script
support_genesis.yml Install a script to upload genesis
support_prune.yml Install a script to prune using cosmprund
support_public_endpoints.yml Set up Nginx reverse proxy for public PRC/ API
support_seed.yml Install seed node with Tenderseed. You need a node_key.json.j2 file so the node_id is consistent
support_tenderduty.yml Install Tenderduty
system_update.yml Update a server and restart if needed

Selected playbook Usage Example

support_seed
ansible-playbook support_seed.yml -e "target=umee_seed seed=190c4496f3b46d339306182fe6a507d5487eacb5@65.108.131.174:36656"
support_tenderduty
ansible-playbook support_tenderduty.yml -e "target=juno_tenderduty key=junovalcons1qyw2x2sjp40cqasdfyuiahsdfknasdkneafs"

Supported Networks

See the group_vars folder for the list of mainnet and testnet support

Port Prefix

All port prefixes come from this repo

V1 to V2 migration [OPTIONAL]

In V1, the custom port prefix is 2 digits. However, this hobby project has evolved into a more ambitious one and we have run out of the prefixes. Therefore, V2 introduces a breaking change of the 3-digit custom port prefixes.

If you have a node running based on V1 port prefix system, you do not need to do anything. However, if you are as OCD as Polkachu, you might want to migrate all the previous nodes to comply with the new system. Here is a playbook to manage the migration. You still need to close the old ports that are not longer in use, but this playbook should take care of the rest.

ansible-playbook support_config_update.yml -e "target=juno_main"

Known Issue

Because this repo tries to accommodate as many Tendermint-based chains as possible, it cannot adapt to all edge cases. Here are some known issues and how to resolve them.

Chain Issue Solution
Axelar Some extra lines at the end of app.toml Delete extra lines and adjust some settings these extra lines are supposed to change
Canto genesis file needs to be unwrapped from .result.genesis Unwrap genesis with jq command
Injective Some extra lines at the end of app.toml Delete extra lines and adjust some settings these extra lines are supposed to change
Kichain Some extra lines at the end of app.toml Delete extra lines and adjust some settings these extra lines are supposed to change
Celestia testnet inconsistent config.toml file variable naming convention Manually adjust config.toml file