linux-system-roles/metrics

grafana install fail

vap0rtranz opened this issue · 19 comments

Galaxy playbook is failing to install Grafana. Error is:

fatal: [localhost]: FAILED! => {"changed": false, "msg": "No package matching 'grafana' found available, installed or updated", "rc": 126, "results": ["No package matching 'grafana' found available, installed or updated"]}

Grafana is available to be installed via YUM:

[justin@netmon2 pcp-install]$ sudo yum search grafana
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.datto.com
 * epel: mirror.grid.uchicago.edu
 * extras: mirrors.xtom.com
 * updates: centos.vwtonline.net
================================================================================= N/S matched: grafana =================================================================================
pcp-webapp-grafana.noarch : Grafana web application for Performance Co-Pilot (PCP)

OS running the playbook is CentOS7.6

richm commented

pcp-webapp-grafana is not the grafana package. Did you set metrics_graph_service: true in your inventory?

Yeah, there is no supported grafana package on RHEL7 (but, you must have opt'ed into it as @richm mentioned). Perhaps we need some better conditional logic and/or diagnostics here for el7.

Ah, you all may find an email from me on this :)

So Fedora >= 31 is required? Did I misunderstand the Galaxy OS Platforms? ... it lists EL7 and EL8 for OS Platforms.

And yes, metrics_graph_service: true

Same failure on Centos 7.8

I tried manually adding grafana OSS repo as described here : https://grafana.com/docs/grafana/latest/installation/rpm/

The playbook then fails to find the package 'grafana-pcp'

Tried to comment 'graphana-pcp' and add 'pcp-webapp-grafana' instead but no Data Source or Dashboard exists in the created Grafana instance

These packages don't exist on RHEL-7 - @richm @pcahyna is there a way the role can handle this a little more gracefully I wonder?

To summarize:

  • I removed pcp-webapp-grafana

  • I installed grafana official OSS rpm repo, that got me grafana-7.0.3

  • I manually installed grafana-pcp-2.0.2 from the tar.gz Github release as described here : https://grafana-pcp.readthedocs.io/en/latest/installation.html

  • pcp availablle in Centos is v4.3.2-7 and this role only supports pcp v5+ so: I also added official pcp bintray repo : wget https://bintray.com/pcp/el7/rpm -O bintray-pcp-el7.repo

  • upgraded pcp* to 5.1.1-1

  • Commented out grafana-pcp in linux-system-roles.metrics/tasks/grafana.yml

  • Re-ran the playbook

  • pcp plugin was disabled in grafana, potentially because beig unsigned, manually enabled it : /plugins/performancecopilot-pcp-app/

Still no pcp data source available in grafana

This still on RHEL-7? You'll also need PCP v5+ if you want to run Grafana and grafana-pcp on that one host - latest upstream PCP builds can be found at https://bintray.com/pcp/el7 - and then enable pmproxy as the role would have done.

If you're OK with using upstream packages off the net, this will work. Another option would have you use your RHEL7 machine as a PCP collector system only (which should work fine with the metrics role, just don't opt-in to the query/graph options in the role). Use a separate analysis machine for that (Fedora laptop?) - PCP will let you connect to and analyse the remote RHEL-7 server(s) then, with the default packages in all RHEL-7 releases.

Yes, Centos 7.8.

  • I updated my post to say that I upgraded to pcp 5.1.1 with the bintray repo
  • Also manually enabled the pcp plugin in grafana (disabled by default, as unsigned)

Now I can manually add 'PCP Redis' & 'PCP Vector' data sources, seen as 'working' on http://localhost:44322

However, I do not know what to do after that. I tried to create a dashbord from these sources but I only see empty data. Not sure if I have to input a specific query?

Try the sample dashboards that ship with grafana-pcp - the 'PCP Vector Host Overview' is the easiest start point. For the Redis and bpftrace datasources you'll need pmlogger + redis running, and pmdapbftrace respectively.

I recommend you jump on the PCP slack or IRC channels on freenode and ask the developers directly if more assistance is needed. https://pcp.io/community.html has pointers, and there's a #grafana sub-channel on slack.

richm commented

These packages don't exist on RHEL-7 - @richm @pcahyna is there a way the role can handle this a little more gracefully I wonder?

Yes. https://github.com/oasis-roles/meta_standards/blob/9aedd8d91e63163ddd93c4bb02b1bd634e65c83c/README.md#supporting-multiple-distributions-and-versions

In vars/main.yml you would have a variable listing the base packages available on all platforms e.g.

__metrics_packages:
  - pcp
  - others?
__metrics_packages_extra: [] # define per-platform/version
__metrics_packages_graph: [] # not sure if there are any - if so, list them as above
__metrics_packages_query: [] # not sure if there are any - if so, list them as above

Then, for the platforms where the extra packages are available, create a vars file for that platform/version e.g. vars/Fedora.yml:

__metrics_packages_extra:
  - pcp-extra1
  - pcp-extra2
__metrics_packages_graph:
  - grafana
  - pcp-grafana
  - others....
__metrics_packages_query:
  - redis
  - pcp-redis
  - others....

Then, in your tasks/main.yml, use something like this:

- name: Set platform/version specific variables
  include_vars: "{{ item }}"
  loop:
    - "{{ role_path }}/vars/{{ ansible_os_family }}.yml"
    - "{{ role_path }}/vars/{{ ansible_distribution }}.yml"
    - "{{ role_path }}/vars/{{ ansible_distribution }}_{{ ansible_distribution_major_version }}.yml"
    - "{{ role_path }}/vars/{{ ansible_distribution }}_{{ ansible_distribution_version }}.yml"
  when: item is file

- name: install packages
  package:
    name: "{{ __metrics_packages + __metrics_packages_extra }}"
    state: present

- name: install graph packages
  package:
    name: "{{ __metrics_packages_graph }}"
    state: present
  when:
    - metrics_graph_service | bool
    - __metrics_packages_graph | d([])

- name: install query packages
  package:
    name: "{{ __metrics_packages_query }}"
    state: present
  when:
    - metrics_query_service | bool
    - __metrics_packages_query | d([])
richm commented

For CentOS, you could also include the extra repos needed. Create tasks/setup_CentOS_7.yml like this:

- name: add grafana yum repo
  yum_repository:
    name: grafana
    description: Grafana YUM repo
    baseurl: https://host:port/some/uri
    other params
  when: metrics_graph_service | bool

- name: add redis yum repo
  yum_repository:
    name: redis
    description: Redis YUM repo
    baseurl: https://host:port/some/uri
    other params
  when: metrics_query_service | bool

see https://docs.ansible.com/ansible/latest/modules/yum_repository_module.html#yum-repository-module

Then in your tasks/main.yml just after you set the per-platform/version variables, add something like this:

- name: Perform platform/version specific tasks
  include_tasks: "{{ item }}"
  loop: "{{ q('first_found', __metrics_setup_files, errors='ignore') }}"
  vars:
    __metrics_setup_files:
      files:
        - "setup_{{ ansible_distribution }}_{{ ansible_distribution_version }}.yml"
        - "setup_{{ ansible_distribution }}_{{ ansible_distribution_major_version }}.yml"
        - "setup_{{ ansible_distribution }}.yml"
        - "setup_{{ ansible_os_family }}.yml"
      paths:
        - tasks

then add a vars/CentOS_7.yml to define __metrics_packages_graph and __metrics_packages_query

richm commented

Try the sample dashboards that ship with grafana-pcp - the 'PCP Vector Host Overview' is the easiest start point. For the Redis and bpftrace datasources you'll need pmlogger + redis running, and pmdapbftrace respectively.

I recommend you jump on the PCP slack or IRC channels on freenode and ask the developers directly if more assistance is needed. https://pcp.io/community.html has pointers, and there's a #grafana sub-channel on slack.

Is this something that the metrics role or the pcp role should be doing? That is, installing and configuring pmlogger, redis, pmdapbftrace

@richm thanks, I'll look into making those improvements this week.

Regarding 'Is this something that the metrics role or the pcp role should be doing?' - yep, definitely.

The roles already configure pmlogger appropriately, and the metrics role can also optionally configure Redis via the 'metrics_query_service' variable, but I've not yet completed the bpftrace aspects (requires authentication support to be in place first, to do it properly).

Back to my OP: the playbook fails when it tries to restart Grafana on Fedora32:

TASK [linux-system-roles.metrics : Install graphing packages] ****************************************************************************************************************************
ERROR! The requested handler 'restart grafana-server' was not found in either the main handlers list nor in the listening handlers list        
[justin@netmon1 ~]$ cat /etc/redhat-release 
Fedora release 32 (Thirty Two)

Did I misunderstand the OS requirements? Sorry to ask again but there was a lot of chatter about dashboards:
has the playbook been known to work on Fedora32?
on RHEL8.2?

Well, typical Ansible. A playbook re-run got rid of the previous error:

TASK [linux-system-roles.metrics : Install graphing packages] ****************************************************************************************************************************
ok: [localhost]

PLAY RECAP *******************************************************************************************************************************************************************************
localhost                  : ok=28   changed=0    unreachable=0    failed=0    skipped=3    rescued=0    ignored=0  

But the Grafana service is disabled after this re-run:

[justin@netmon1 ~]$ sudo systemctl list-unit-files | grep grafana
grafana-server.service                         disabled        disabled 

So I manually did what I'd think the playbook would have been trying to do:

[justin@netmon1 ~]$ sudo systemctl enable grafana-server
Created symlink /etc/systemd/system/multi-user.target.wants/grafana-server.service → /usr/lib/systemd/system/grafana-server.service.
[justin@netmon1 ~]$ sudo systemctl restart grafana-server

Now Grafana dashboard is up on my PCP monitoring box:

[justin@netmon1 ~]$ sudo netstat --listen --numeric | grep 3000
tcp6       0      0 :::3000                 :::*                    LISTEN   

But I'm back at the same issue as on CentOS7: there is no PCP App listed in the Grafana dashboard.

I did verify that Fedora has grafana-pcp installed:

sudo dnf install grafana-pcp
Last metadata expiration check: 1:04:37 ago on Tue 23 Jun 2020 09:52:21 AM CDT.
Package grafana-pcp-2.0.2-1.fc32.noarch is already installed.
Dependencies resolved.
Nothing to do.
Complete!

@vap0rtranz sounds like you're up to the 'Install Data Sources' section of: https://grafana-pcp.readthedocs.io/en/latest/quickstart.html where you need to select which data sources you'd like to use?

I'm looking into why the Grafana service wasn't running - it should have been if you'd set "metrics_graph_service: yes" in your playbook.

From a little reading today, it looks like we can automate the data sources setup too:
https://grafana.com/docs/grafana/latest/administration/provisioning/

There's also a REST API which allows creating and updating datasources:
https://grafana.com/docs/grafana/latest/http_api/data_source/

but I'm leaning towards using regular file based provisioning for our needs.

Hi Nathan,

Basically you're right. I poked around Grafana until I got the App and Data Source up, so this may be more of newbie-to-Grafana user error :). Here's what I did in Grafana:

  1. Configuration -> Plugins -> search "Performance Co-Pilot" -> select Enable
  2. Configuration -> Data Sources -> search "PCP Redis" -> Select

After the fact, I found this process in the docs (https://grafana-pcp.readthedocs.io/en/latest/quickstart.html#installation-fedora)

Even though the last step in the playbook needed a re-rerun, and manual step or 2 for grafana and redis services on my Fedora box, this all worked out. So I think this issue can be Closed :)

Ty!

Justin

@vap0rtranz working with a colleague earlier this week he's shown me how to make Grafana provisioning work for the PCP data sources. In addition to installing the provisioning file, there's a REST API call that needs to be made - I'm looking into exactly how to do this (looks like the Ansible "uri" module is the way to go), and will commit those changes today.

I've also got some improvements to the grafana-server start process that I think will help with the odd behaviour you saw there initially too. And I've got the recommendations from @richm earlier in this issue to sort through for platforms without Grafana ... once I get all of these ducks lined up, I think we can close this issue out.