elastic/ansible-beats

Add support or example for `processors` config section

a03nikki opened this issue · 7 comments

Describe the feature:

Add support or an example for the processors configuration section within this role.

There is this closed discuss question for this problem that did not have a resolution to this same challenge.

Beats product: Metricbeat

Beats version: 7.6.0

Role version: 7.6.0

OS version (uname -a if on a Unix-like system):

Beat host: Ubuntu container

root@redacted:/# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.4 LTS"
root@redacted:/# uname -a
Linux 4768ab1b9aa9 4.19.76-linuxkit #1 SMP Thu Oct 17 19:31:58 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Controlling host: MacOS Catalina - 10.15.3 (19D76)

~ % uname -a
Darwin redacted.local 19.3.0 Darwin Kernel Version 19.3.0: Thu Jan  9 20:58:23 PST 2020; root:xnu-6153.81.5~1/RELEASE_X86_64 x86_64

Description of the problem including expected versus actual behavior:

The playbook does produce the default configuration of Metricbeat v7 using the ansible-playbook. On Ubuntu, the default is this with the comments removed

root@redacted:/# grep -v "^\s*#" /etc/metricbeat/metricbeat.yml | grep -v "^\s*$"
metricbeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
setup.template.settings:
  index.number_of_shards: 1
  index.codec: best_compression
setup.kibana:
output.elasticsearch:
  hosts: ["localhost:9200"]
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~
  - add_docker_metadata: ~
  - add_kubernetes_metadata: ~

If one uses a playbook such as the one listed below, one can get close but the processors are incorrect

root@redacted:/# cat /etc/metricbeat/metricbeat.yml
# Ansible managed

################### metricbeat Configuration #########################

############################# metricbeat ######################################
metricbeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
processors:
- add_host_metadata: null
- add_cloud_metadata: null
- add_docker_metadata: null
- add_kubernetes_metadata: null
setup.template.settings:
  index.codec: best_compression
  index.number_of_shards: 1


###############################################################################
############################# Libbeat Config ##################################
# Base config file used by all other beats for using libbeat features

############################# Output ##########################################

output:
  elasticsearch:
    hosts:
    - localhost:8200


############################# Logging #########################################

logging:
  files:
    rotateeverybytes: 10485760

This configuration will run Metricbeat but the metadata about the servers will be missing from the records so the Metrics UI in Kibana does not work properly.

Single quoting (') the ~ produces a config file that includes

processors:
- add_host_metadata: '~'
- add_cloud_metadata: '~'
- add_docker_metadata: '~'
- add_kubernetes_metadata: '~'

which causes Metricbeat to not run because it throws an error about string is not an object.

RUNNING HANDLER [elastic.beats : restart the service] **************************
fatal: [7a2b85391a72]: FAILED! => {"changed": false, "msg": "Failed to restart service: metricbeat", "rc": 1, "stderr": "2020-03-02T23:05:43.532Z\tINFO\tinstance/beat.go:622\tHome path: [/usr/share/metricbeat/bin] Config path: [/usr/share/metricbeat/bin] Data path: [/usr/share/metricbeat/bin/data] Logs path: [/usr/share/metricbeat/bin/logs]\n2020-03-02T23:05:43.532Z\tINFO\tinstance/beat.go:630\tBeat ID: 05094e0e-7db0-4357-a1d0-f6a40f08eda8\n2020-03-02T23:05:43.532Z\tERROR\tinstance/beat.go:933\tExiting: error initializing processors: can not convert 'string' into 'object' accessing 'processors.0.add_host_metadata' (source:'/etc/metricbeat/metricbeat.yml')\nExiting: error initializing processors: can not convert 'string' into 'object' accessing 'processors.0.add_host_metadata' (source:'/etc/metricbeat/metricbeat.yml')\n",
 "stderr_lines": ["2020-03-02T23:05:43.532Z\tINFO\tinstance/beat.go:622\tHome path: [/usr/share/metricbeat/bin] Config path: [/usr/share/metricbeat/bin] Data path: [/usr/share/metricbeat/bin/data] Logs path: [/usr/share/metricbeat/bin/logs]",
 "2020-03-02T23:05:43.532Z\tINFO\tinstance/beat.go:630\tBeat ID: 05094e0e-7db0-4357-a1d0-f6a40f08eda8",
 "2020-03-02T23:05:43.532Z\tERROR\tinstance/beat.go:933\tExiting: error initializing processors: can not convert 'string' into 'object' accessing 'processors.0.add_host_metadata' (source:'/etc/metricbeat/metricbeat.yml')",
 "Exiting: error initializing processors: can not convert 'string' into 'object' accessing 'processors.0.add_host_metadata' (source:'/etc/metricbeat/metricbeat.yml')"],
 "stdout": "   ...fail!\n", "stdout_lines": ["   ...fail!"]}

Double quoting (") the ~ produces a config that includes

processors:
- add_host_metadata: '~'
- add_cloud_metadata: '~'
- add_docker_metadata: '~'
- add_kubernetes_metadata: '~'
TASK [elastic.beats : Start metricbeat service] ********************************
fatal: [7a2b85391a72]: FAILED! => {"changed": false, "msg": "Failed to start service: metricbeat", "rc": 1, "stderr": "2020-03-02T23:09:51.996Z\tINFO\tinstance/beat.go:622\tHome path: [/usr/share/metricbeat/bin] Config path: [/usr/share/metricbeat/bin] Data path: [/usr/share/metricbeat/bin/data] Logs path: [/usr/share/metricbeat/bin/logs]\n2020-03-02T23:09:51.996Z\tINFO\tinstance/beat.go:630\tBeat ID: 05094e0e-7db0-4357-a1d0-f6a40f08eda8\n2020-03-02T23:09:51.996Z\tERROR\tinstance/beat.go:933\tExiting: error initializing processors: can not convert 'string' into 'object' accessing 'processors.0.add_host_metadata' (source:'/etc/metricbeat/metricbeat.yml')\nExiting: error initializing processors: can not convert 'string' into 'object' accessing 'processors.0.add_host_metadata' (source:'/etc/metricbeat/metricbeat.yml')\n",
 "stderr_lines": ["2020-03-02T23:09:51.996Z\tINFO\tinstance/beat.go:622\tHome path: [/usr/share/metricbeat/bin] Config path: [/usr/share/metricbeat/bin] Data path: [/usr/share/metricbeat/bin/data] Logs path: [/usr/share/metricbeat/bin/logs]",
 "2020-03-02T23:09:51.996Z\tINFO\tinstance/beat.go:630\tBeat ID: 05094e0e-7db0-4357-a1d0-f6a40f08eda8",
 "2020-03-02T23:09:51.996Z\tERROR\tinstance/beat.go:933\tExiting: error initializing processors: can not convert 'string' into 'object' accessing 'processors.0.add_host_metadata' (source:'/etc/metricbeat/metricbeat.yml')",
 "Exiting: error initializing processors: can not convert 'string' into 'object' accessing 'processors.0.add_host_metadata' (source:'/etc/metricbeat/metricbeat.yml')"],
 "stdout": "   ...fail!\n", "stdout_lines": ["   ...fail!"]}

Playbook:

- name: Install and configure Beats
  hosts: all
  tasks:
    - name: 'Install Metricbeat'
      include_role:
        name: elastic.beats
      vars:
        beat: metricbeat
        beat_conf:
          metricbeat.config.modules:
              path: '${path.config}/modules.d/*.yml'
              reload.enabled: false
          setup.template.settings:
            index.number_of_shards: 1
            index.codec: best_compression
          processors:
            - add_host_metadata: ~
            - add_cloud_metadata: ~
            - add_docker_metadata: ~
            - add_kubernetes_metadata: ~
        output_conf:
          elasticsearch:
            hosts: ['localhost:8200']

Provide logs from Ansible:

Beats logs if relevant:

Here is what I've established so far.

First, I looked up in the YAML v1.3 spec that the ~ character indicates null.

From 10.3.2. Tag Resolution:

Regular expression Resolved to tag
null | Null | NULL | ~ tag:yaml.org,2002:null

Next, I reviewed the documentation for the add_host_metadata processor. I observed that all of the parameters on the processor are optional.

Therefore, when ~ is used in the configuration file, the processor is enabled using the default settings.

So I've retried the ~ in the Ansible file again, and it does appear that the metadata is populated on the records

processors:
- add_host_metadata: null
- add_cloud_metadata: null
- add_docker_metadata: null
- add_kubernetes_metadata: null

So I'm confused. 😕

This does appear to work as well (sending an empty map by {})

processors:
- add_host_metadata: {}
- add_cloud_metadata: {}
- add_docker_metadata: {}
- add_kubernetes_metadata: {}

😕 I am also not certain where the undesirable behavior is coming from any more (Ansible vs. Metricbeat).

Metricbeat uses gopkg.in/yaml.v2 to parse the metricbeat.yml file, if I am reading the code correctly.

There are a number of issues (both open and closed) related to the underlying Go library used by Metricbeat to handle parsing the YAML configuration. So that could be an avenue worth investigating.

jmlrt commented

Hi @a03nikki,
Indeed ~ is translated as null value by yaml which both Ansible and Beats are using, so /etc/metricbeat/metricbeat.yml will be set with add_XXX_metadata: null.
However this configuration seem working as I'm able to retrieve cloud metadata on a GCP instance:

  • playbook:
- hosts: localhost
  roles:
    - role: elastic.elasticsearch
    - role: elastic.beats
      beat: metricbeat
      beat_conf:
        metricbeat.config.modules:
          path: '${path.config}/modules.d/*.yml'
          reload.enabled: false
        setup.template.settings:
          index.number_of_shards: 1
          index.codec: best_compression
        processors:
          - add_host_metadata: ~
          - add_cloud_metadata: ~
          - add_docker_metadata: ~
          - add_kubernetes_metadata: ~
  • generated /etc/metricbeat/metricbeat.yml:
# Ansible managed

################### metricbeat Configuration #########################

############################# metricbeat ######################################
metricbeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
processors:
- add_host_metadata: null
- add_cloud_metadata: null
- add_docker_metadata: null
- add_kubernetes_metadata: null
setup.template.settings:
  index.codec: best_compression
  index.number_of_shards: 1


###############################################################################
############################# Libbeat Config ##################################
# Base config file used by all other beats for using libbeat features

############################# Output ##########################################

output:
  elasticsearch:
    hosts:
    - localhost:9200


############################# Logging #########################################

logging:
  files:
    rotateeverybytes: 10485760
  • Metricbeat logs:
Mar 04 14:53:20 jmlrt-test systemd[1]: Started Metricbeat is a lightweight shipper for metrics..
...
Mar 04 14:53:21 jmlrt-test metricbeat[29405]: 2020-03-04T14:53:21.034Z        INFO        add_cloud_metadata/add_cloud_metadata.go:93        add_cloud_metadata: hosting provider type detected as gcp, metadata={"availability_zone":"europe-west1-b","instance":{"id":"1000121967836269311","name":"jmlrt-test"},"
...
Mar 04 14:53:21 jmlrt-test metricbeat[29405]: 2020-03-04T14:53:21.050Z        INFO        instance/beat.go:439        metricbeat start running.
  • Example of record in Elasticsearch:
{
  "_index" : "metricbeat-7.6.0-2020.03.04-000001",
  "_type" : "_doc",
  "_id" : "39kVpnAB1zu53QkUCx9J",
  "_version" : 1,
  "_seq_no" : 9,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "@timestamp" : "2020-03-04T15:05:58.970Z",
    "system" : {
      "filesystem" : {
        "used" : {
          "pct" : 0.0343,
          "bytes" : 3756032
        },
        "device_name" : "/dev/sda15",
        "mount_point" : "/boot/efi",
        "type" : "vfat",
        "total" : 109422592,
        "free" : 105666560,
        "available" : 105666560,
        "files" : 0,
        "free_files" : 0
      }
    },
    "ecs" : {
      "version" : "1.4.0"
    },
    "host" : {
      "hostname" : "jmlrt-test",
      "architecture" : "x86_64",
      "os" : {
        "name" : "Ubuntu",
        "kernel" : "5.0.0-1031-gcp",
        "codename" : "bionic",
        "platform" : "ubuntu",
        "version" : "18.04.4 LTS (Bionic Beaver)",
        "family" : "debian"
      },
      "name" : "jmlrt-test",
      "id" : "6c148824a8b9c66a44c3da2ce1402ec1",
      "containerized" : false
    },
    "agent" : {
      "version" : "7.6.0",
      "type" : "metricbeat",
      "ephemeral_id" : "d96c180a-c174-47de-bb3e-336ff82e6dad",
      "hostname" : "jmlrt-test",
      "id" : "07b0324a-f0fc-43a6-8289-6e5f5c27dd45"
    },
    "cloud" : {
      "project" : {
        "id" : "elastic-infra"
      },
      "provider" : "gcp",
      "instance" : {
        "name" : "jmlrt-test",
        "id" : "1000121967836269311"
      },
      "machine" : {
        "type" : "n2-standard-2"
      },
      "availability_zone" : "europe-west1-b"
    },
    "event" : {
      "dataset" : "system.filesystem",
      "module" : "system",
      "duration" : 1589087
    },
    "metricset" : {
      "name" : "filesystem",
      "period" : 60000
    },
    "service" : {
      "type" : "system"
    }
  }
}

@jmlrt : I am glad it is working for you too. I think it is sufficient to say that there is not a bug in Ansible or Metricbeat.

So maybe the best option is to document a solution in the README.md so other people don't also have to struggle with this?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

This issue has been automatically closed because it has not had recent activity since being marked as stale.