replicatedhq/troubleshoot

analyzer: goldpinger pingmap

adamancini opened this issue · 2 comments

Describe the rationale for the suggested feature.

Be able to immediately report on the state of cluster networking - a huge help with joining additional nodes to kURL, or with identifying networking issues that affect CNI but not host networking. Today, we collect the goldpinger results if it is installed, but we do not analyze it.

Describe the feature
Analyze the results of the goldpinger addon's pingmap - result is a nested JSON object with results of the HTTP query to each other goldpinger pod's API.

API documentation at https://github.com/bloomberg/goldpinger/blob/master/swagger.yml

could probably use /cluster_health and /check_all for analysis

ada@ada-kurl:~$ curl http://10.96.1.41/cluster_health | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   159  100   159    0     0  15713      0 --:--:-- --:--:-- --:--:-- 17666
{
  "OK": true,
  "duration-ns": 6896527,
  "generated-at": "2023-10-26T19:50:43.452Z",
  "nodesHealthy": [
    "10.142.0.5",
    "10.142.15.251"
  ],
  "nodesTotal": 2,
  "nodesUnhealthy": null
}
ada@ada-kurl:~$ curl http://10.96.1.41/check_all | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1160  100  1160    0     0  96353      0 --:--:-- --:--:-- --:--:--  102k
{
  "hosts": [
    {
      "hostIP": "10.142.15.251",
      "podIP": "10.32.1.4",
      "podName": "goldpinger-8j4vq"
    },
    {
      "hostIP": "10.142.0.5",
      "podIP": "10.32.0.228",
      "podName": "goldpinger-c4wsl"
    }
  ],
  "responses": {
    "goldpinger-8j4vq": {
      "HostIP": "10.142.15.251",
      "OK": true,
      "PodIP": "10.32.1.4",
      "response": {
        "podResults": {
          "goldpinger-8j4vq": {
            "HostIP": "10.142.15.251",
            "OK": true,
            "PingTime": "2023-10-26T19:50:32.896Z",
            "PodIP": "10.32.1.4",
            "response": {
              "boot_time": "2023-10-26T19:47:11.980Z"
            },
            "status-code": 200
          },
          "goldpinger-c4wsl": {
            "HostIP": "10.142.0.5",
            "OK": true,
            "PingTime": "2023-10-26T19:50:46.771Z",
            "PodIP": "10.32.0.228",
            "response": {
              "boot_time": "2023-10-26T17:23:06.477Z"
            },
            "response-time-ms": 1,
            "status-code": 200
          }
        }
      }
    },
    "goldpinger-c4wsl": {
      "HostIP": "10.142.0.5",
      "OK": true,
      "PodIP": "10.32.0.228",
      "response": {
        "podResults": {
          "goldpinger-8j4vq": {
            "HostIP": "10.142.15.251",
            "OK": true,
            "PingTime": "2023-10-26T19:50:43.861Z",
            "PodIP": "10.32.1.4",
            "response": {
              "boot_time": "2023-10-26T19:47:11.980Z"
            },
            "response-time-ms": 1,
            "status-code": 200
          },
          "goldpinger-c4wsl": {
            "HostIP": "10.142.0.5",
            "OK": true,
            "PingTime": "2023-10-26T19:50:39.635Z",
            "PodIP": "10.32.0.228",
            "response": {
              "boot_time": "2023-10-26T17:23:06.477Z"
            },
            "status-code": 200
          }
        }
      }
    }
  }
}
ada@ada-kurl:~$