pambrose/prometheus-proxy

First node works, second does not...

Closed this issue · 3 comments

I have two nodes configured. I am using wmi_exporter for both. The first node is up and I can see all the metrics. Everything looks great, but then I added the second node and for some reason it just will not come up. I can't make much sense of it.

I am getting these errors in the proxy logs:

15:46:30.350 ERROR [ScrapeRequestManager.kt:46] - Missing ScrapeRequestWrapper for scrape_id: 91 [grpc-default-executor-1]
15:46:40.123 ERROR [ScrapeRequestManager.kt:46] - Missing ScrapeRequestWrapper for scrape_id: 93 [grpc-default-executor-1]
15:46:49.936 ERROR [ScrapeRequestManager.kt:46] - Missing ScrapeRequestWrapper for scrape_id: 95 [grpc-default-executor-1]
15:46:59.862 ERROR [ScrapeRequestManager.kt:46] - Missing ScrapeRequestWrapper for scrape_id: 97 [grpc-default-executor-1]

Agent logs look good:

15:44:12.266 INFO  [GenericServiceListener.kt:30] - Running AdminService{port=8093, paths=[/ping, /version, /healthcheck, /threaddump]} [AdminService STARTING]
15:44:12.266 INFO  [GenericService.kt:136] - All Agent services healthy [AdminService STARTING]
15:44:12.573 INFO  [AgentGrpcService.kt:144] - Connected to proxy at {IP Removed]:50051 using plaintext [Agent Unnamed-prometheus-agent]
15:44:12.697 INFO  [AgentPathManager.kt:65] - Registered http://10.100.61.63:9182/metrics as /bgr-rds02_metrics [Agent Unnamed-prometheus-agent]
15:44:12.723 INFO  [AgentPathManager.kt:65] - Registered http://10.100.61.61:9182/metrics as /bgr-rds01_metrics [Agent Unnamed-prometheus-agent]
15:44:12.767 INFO  [Agent.kt:194] - Heartbeat scheduled to fire after 5.00s of inactivity [DefaultDispatcher-worker-1]

prometheus.yml

  - job_name: 'bgr-rds02'
    metrics_path: '/bgr-rds02_metrics'
    static_configs:
      - targets: ['prometheus-proxy:8080']

  - job_name: 'bgr-rds01'
    metrics_path: '/bgr-rds01_metrics'
    static_configs:
      - targets: ['prometheus-proxy:8080']

prom-agent.conf

proxy {
  admin.enabled: true
  metrics.enabled: true
}

agent {
  proxy.hostname = ${HOSTNAME}
  admin.enabled: true
  metrics.enabled: true

  pathConfigs: [
    {
      name: "bgr-rds02"
      path: bgr-rds02_metrics
      url: "http://10.100.61.63:9182/metrics"
    }
    {
      name: "bgr-rds01"
      path: bgr-rds01_metrics
      url: "http://10.100.61.61:9182/metrics"
    }
  ]
}

Hmmm. I am not sure what is going on with that.

Can you try a couple of experiments:

  • Alter the order of adding the nodes and see what happens.
  • Add a 3rd node (duplicating one of the 2) and see what happens with that.

I just tried changing the order, adding another node entry for same box, changing the name. I can curl the metrics without any problem from the agent box.

I then added another node entirely and that node works. Very very odd why that one node seems to be having issues. I am going to try adding a few more modes tomorrow and will let you know how it goes. Could be just an odd anomaly with that particular node.

Interesting. If you can hit it with curl, the agent should be able to hit as well. Even if it is something on your end, I should be producing a better error message than that. Please let me know what you see tomorrow.