[manala.telegraf] service unable to start during initial provisioning
lisuml opened this issue · 5 comments
manala.roles version: 3.2.0
During an initial provisioning of the node with manala.telegraf
role attached, the service is not being started properly:
TASK [manala.roles.telegraf : Configs > Templates present] ****************************************************************************************************************************************************************************************************
changed: [d-test.euc1.XXX.lan] => (item={'state': 'present', 'template': 'configs/_default.j2', 'file': '/etc/telegraf/telegraf.d/os.conf', 'config': '[[inputs.cpu]]\n totalcpu = true\n[[inputs.disk]]\n ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]\n[[inputs.diskio]]\n[[inputs.kernel]]\n[[inputs.mem]]\n[[inputs.net]]\n[[inputs.netstat]]\n[[inputs.processes]]\n[[inputs.system]]\n'})
changed: [d-test.euc1.XXX.lan] => (item={'state': 'present', 'template': 'configs/_default.j2', 'file': '/etc/telegraf/telegraf.d/output.conf', 'config': '[[outputs.influxdb]]\n urls = [ "udp://metrix.euc1.XXX.lan:8089" ]\n udp_payload = "1024B"\n'})
TASK [manala.roles.telegraf : Configs > Files absent] *********************************************************************************************************************************************************************************************************
TASK [manala.roles.telegraf : Services > Services] ************************************************************************************************************************************************************************************************************
failed: [d-test.euc1.XXX.lan] (item=telegraf) => {"ansible_loop_var": "item", "changed": false, "item": "telegraf", "msg": "Unable to start service telegraf: Job for telegraf.service failed because the control process exited with error code.\nSee \"systemctl status telegraf.service\" and \"journalctl -xe\" for details.\n"}
As you can see, the configs are defined properly, but it seems they are not ready on service start.
The error I see in systemd:
Jan 17 13:40:15 d-test.euc1.XXX.lan telegraf[8968]: 2023-01-17T13:40:15Z E! [telegraf] Error running agent: no outputs found, did you provide a valid config file?
Jan 17 13:40:15 d-test.euc1.XXX.lan systemd[1]: telegraf.service: Main process exited, code=exited, status=1/FAILURE
During the 2nd provisioning attempt, the error is gone and the service starts normally.
More investigation made and it seems the issue is only present with telegraf 1.25.0 (most recent one at the moment).
The issue is caused by the fact, the official debian packages provided by influxdata automatically try to start the telegraf
systemd service on installation time and the working configuration for the outputs is expected to be part of the config file at that time, but the outputs configuration is not there.
This looks like a bug of telegraf itself or/and telegraf official debian packages. I'm going to file an github issue on the official telegraf repository.
For me, the workaround was simply to pick lower version of the telegraf to install from ansible playbook:
manala_telegraf_install_packages_default:
- telegraf=1.24.4-1
@nervo: thanks for the followup!
Would you provide all your values passed to the role ?
These are my ansible variables:
manala_apt_preferences:
- influxdb@influxdata
manala_telegraf_install_packages:
- telegraf=1.24.4-1
manala_telegraf_config_template: config/telegraf/base/telegraf.conf.j2
manala_telegraf_config:
global_tags:
environment: "{{ env }}"
manala_telegraf_configs:
- file: os.conf
config: |
[[inputs.cpu]]
totalcpu = true
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.net]]
[[inputs.netstat]]
[[inputs.processes]]
[[inputs.system]]
- file: output.conf
config: |
[[outputs.influxdb]]
urls = [ "udp://metrix.euc1.XXX.lan:8089" ]
udp_payload = "1024B"
use manala_telegraf_install_packages instead of manala_telegraf_install_packages_default
Roger that.
FYI: I created an issue in telegraf github repo: influxdata/telegraf#12514
Ok, so let's wait for the next telegraf version :)
(btw, you should also use explicit telegraf apt preference)
manala_apt_preferences:
- telegraf@influxdata
(btw, you should also use explicit telegraf apt preference)
My bad. Thanks for pointing this out!