ansible-collections/ansible-consul

Connection Refused error after consul installation

sushil-clearsense opened this issue · 8 comments

Hi All,

I am new to consul and used the ansible-consul role to setup consul on three new servers. The servers are using CentOS 7.

After the start consul task, my script shows this error:

TASK [consul : Check Consul HTTP API (via TCP socket)] *******************************************************************************************************fatal: [DJX-VML-CONSUL3]: FAILED! => {"changed": false, "elapsed": 300, "msg": "Timeout when waiting for 127.0.0.1:8500"}
fatal: [DJX-VML-CONSUL1]: FAILED! => {"changed": false, "elapsed": 300, "msg": "Timeout when waiting for 127.0.0.1:8500"}
fatal: [DJX-VML-CONSUL2]: FAILED! => {"changed": false, "elapsed": 300, "msg": "Timeout when waiting for 127.0.0.1:8500"}
fatal: [DJX-VML-NOMAD01]: FAILED! => {"changed": false, "elapsed": 300, "msg": "Timeout when waiting for 127.0.0.1:8500"}

When i try to check consul info on the bootstrap server, I see connection refused error:

`consul info

Error querying agent: Get http://127.0.0.1:8500/v1/agent/self: dial tcp 127.0.0.1:8500: getsockopt: connection refused`

I dont get much useful information from the systemctl logs

● consul.service - Consul agent
   Loaded: loaded (/usr/lib/systemd/system/consul.service; enabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: exit-code) since Mon 2021-12-27 20:25:55 UTC; 20s ago
  Process: 224174 ExecStart=/usr/local/bin/consul agent -config-file=/etc/consul/config.json -config-dir=/etc/consul.d -pid-file=/run/consul/consul.pid (code=exited, status=1/FAILURE)
  Process: 224171 ExecStartPre=/bin/chown -R consul:consul /run/consul (code=exited, status=0/SUCCESS)
  Process: 224168 ExecStartPre=/bin/mkdir -m 0750 -p /run/consul (code=exited, status=0/SUCCESS)
 Main PID: 224174 (code=exited, status=1/FAILURE)

Tried enabling debug logs, but no logs are being generated. Here is the inventory file i used:

[consul_instances]
DJX-VML-CONSUL1 ansible_host=##### consul_node_role=server consul_bootstrap_expect=true
DJX-VML-CONSUL2 ansible_host=#### consul_node_role=server consul_bootstrap_expect=true
DJX-VML-CONSUL3 ansible_host=##### consul_node_role=server consul_bootstrap_expect=true
DJX-VML-NOMAD01 ansible_host=##### consul_node_role=client

consul config.json

[root@djx-vml-consul1 consul]# cat config.json
{
    "addresses": {
        "dns": "127.0.0.1",
        "http": "127.0.0.1",
        "https": "127.0.0.1"
    },
    "advertise_addr": #####,
    "advertise_addr_wan": #####,
    "bind_addr": ######,
    "bootstrap": false,
    "bootstrap_expect": 0,
    "client_addr": "127.0.0.1",
    "data_dir": "/opt/consul",
    "datacenter": "dc1",
    "disable_update_check": false,
    "domain": "consul",
    "enable_local_script_checks": false,
    "enable_script_checks": false,
    "encrypt": #######,
    "encrypt_verify_incoming": true,
    "encrypt_verify_outgoing": true,
    "log_file": "/var/log/consul/consul.log",
    "log_level": "DEBUG",
    "log_rotate_bytes": 0,
    "log_rotate_duration": "24h",
    "performance": {
        "leave_drain_time": "5s",
        "raft_multiplier": 1,
        "rpc_hold_timeout": "7s"
    },
    "ports": {
        "dns": 8600,
        "http": 8500,
        "https": -1,
        "serf_lan": 8301,
        "serf_wan": 8302,
        "server": 8300
    },
    "raft_protocol": 3,
    "retry_interval": "30s",
    "retry_interval_wan": "30s",
    "retry_join": [],
    "retry_max": 0,
    "retry_max_wan": 0,
    "server": true,
    "translate_wan_addrs": false,
    "ui": true

Can anyone please suggest what could be the issue?

run /usr/local/bin/consul agent -config-file=/etc/consul/config.json -config-dir=/etc/consul.d -pid-file=/run/consul/consul.pid

and /usr/local/bin/consul agent -config-file=/etc/consul/config.json -config-dir=/etc/consul.d and see what errors you see

Hi @lanefu , thanks for the help.

I was able to see after the command, that the issue was due to invalid keys in the config. Checked on the consul site and found that some of these keys were added in the later version than the one i was installing.

Removed those specific keys from the config and it is working now.

I was able to see after the command, that the issue was due to invalid keys in the config. Checked on the consul site and found that some of these keys were added in the later version than the one i was installing.

@sushil-clearsense Which keys are you referring to? Could you outline the changes you made? Thanks.

and see what errors you see

Consul doesn't start as /var/log/consul directory doesn't exist. Output below:

root@consul2:/home/vagrant# /usr/local/bin/consul agent -config-file=/etc/consul/config.json -config-dir=/etc/consul/consul.d
==> Failed to setup logging: open /var/log/consul/consul-1640771892712318601.log: no such file or directory

From journalctl:

root@consul2:/home/vagrant# journalctl -u consul
-- Logs begin at Wed 2021-12-29 09:50:28 GMT, end at Wed 2021-12-29 09:56:24 GMT. --
Dec 29 09:51:57 consul2 systemd[1]: Starting Consul agent...
Dec 29 09:51:57 consul2 systemd[1]: Started Consul agent.
Dec 29 09:51:57 consul2 consul[1670]: ==> Failed to setup logging: open /var/log/consul/consul-1640771517595577466.log: no such file or directory
Dec 29 09:51:57 consul2 systemd[1]: consul.service: Main process exited, code=exited, status=1/FAILURE
Dec 29 09:51:57 consul2 systemd[1]: consul.service: Unit entered failed state.
Dec 29 09:51:57 consul2 systemd[1]: consul.service: Failed with result 'exit-code'.
Dec 29 09:52:40 consul2 systemd[1]: consul.service: Service hold-off time over, scheduling restart.
Dec 29 09:52:40 consul2 systemd[1]: Stopped Consul agent.

Not sure if the playbook creates /var/log/consul/ and grants write permission to consul user as is apparently required

The step to create log directory (defined in brianshumate.consul/tasks/dirs.yml) was skipped:

TASK [brianshumate.consul : Create log directory] ******************************
skipping: [consul1.consul] => (item=/var/log/consul) 
skipping: [consul2.consul] => (item=/var/log/consul) 
skipping: [consul3.consul] => (item=/var/log/consul) 

Looks like the criteria for creating log directory are different between brianshumate.consul (consul_syslog_enable | bool) and the version in master branch of this repo (not consul_syslog_enable | bool). Change README_VAGRANT.md to not use the old role?

I was able to see after the command, that the issue was due to invalid keys in the config. Checked on the consul site and found that some of these keys were added in the later version than the one I was installing.

@sushil-clearsense Which keys are you referring to? Could you outline the changes you made? Thanks.

Hi @egmanoj , not sure if you are still facing the issues. The keys I meant are the consul configs. The version I was using did not have the support for the key i was using --> https://www.consul.io/docs/agent/options#_log_file