An error has occurred during metrics collection
Opened this issue · 6 comments
I tried running this exporter but I am getting the following error
An error has occurred during metrics collection:
4 error(s) occurred:
* collected metric nomad_allocation_cpu label:<name:"alloc" value:"infra/statsd-exporter.statsd-exporter[0]" > label:<name:"group" value:"statsd-exporter" > label:<name:"job" value:"infra/statsd-exporter" > gauge:<value:4.02303193877551 > was collected before with the same name and label values
* collected metric nomad_allocation_cpu_throttle label:<name:"alloc" value:"infra/statsd-exporter.statsd-exporter[0]" > label:<name:"group" value:"statsd-exporter" > label:<name:"job" value:"infra/statsd-exporter" > gauge:<value:0 > was collected before with the same name and label values
* collected metric nomad_allocation_memory label:<name:"alloc" value:"infra/statsd-exporter.statsd-exporter[0]" > label:<name:"group" value:"statsd-exporter" > label:<name:"job" value:"infra/statsd-exporter" > gauge:<value:2.2781952e+07 > was collected before with the same name and label values
* collected metric nomad_allocation_memory_limit label:<name:"alloc" value:"infra/statsd-exporter.statsd-exporter[0]" > label:<name:"group" value:"statsd-exporter" > label:<name:"job" value:"infra/statsd-exporter" > gauge:<value:256 > was collected before with the same name and label values
i believe this is happens when there are several older allocations
nomad status infra/statsd-exporter
ID = infra/statsd-exporter
Name = infra/statsd-exporter
Type = service
Priority = 50
Datacenters = ovh
Status = running
Periodic = false
Summary
Task Group Queued Starting Running Failed Complete Lost
statsd-exporter 0 0 1 0 0 0
Allocations
ID Eval ID Node ID Task Group Desired Status Created At
57ec626a 60bc583d 375d5aaf statsd-exporter run running 09/08/16 10:13:30 UTC
47ce6dd6 2e863db7 375d5aaf statsd-exporter stop complete 09/08/16 09:28:16 UTC
16dc534e 5913852f 22defaf9 statsd-exporter stop complete 09/05/16 11:55:17 UTC
after manually triggering garbage collection the old allocations were gone and the exporter worked.
curl -X PUT http://localhost:4646/v1/system/gc
nomad status infra/statsd-exporter
ID = infra/statsd-exporter
Name = infra/statsd-exporter
Type = service
Priority = 50
Datacenters = ovh
Status = running
Periodic = false
Summary
Task Group Queued Starting Running Failed Complete Lost
statsd-exporter 0 0 1 0 0 0
Allocations
ID Eval ID Node ID Task Group Desired Status Created At
57ec626a 60bc583d 375d5aaf statsd-exporter run running 09/08/16 10:13:30 UTC
Might need to add the allocation id as a label to the allocations or alternatively only collect from running allocations to ensure the uniqueness of the name + labels (job_name,group_name,alloc_name[alloc_index]). I will take a closer look later today.
I think only running allocations are of interest as they are the only ones that with interesting metrics.
I don't know if the alloc index is necessary, isn't alloc name already unique?
the alloc index is included in the alloc name, if a group has count = 10 then the allocs have a name of task_name[alloc_index 0..9]
Sorry my mistake, I meant allocation ID, not index.
Would it be of interest to have allocation by type? Right now nomad_allocations shows all allocations.
I don't know if it would be interesting to have
nomad_allocations{status="running|completed"}
might be useful, would allow monitoring queued counts etc. Same could perhaps be extended to nodes. And we could add evaluations by status as well in the future. We should go through the information the builtin stats providers (statsite, statsd, datadog etc) expose and try to emulate those to some extent.
Hi
The same error with nomad_serf_lan_member_status
:
An error has occurred during metrics collection:
collected metric nomad_serf_lan_member_status label:<name:"class" value:"" > label:<name:"datacenter" value:"staging" > label:<name:"drain" value:"false" > label:<name:"node" value:"<cluster_member_hostname_here>" > gauge:<value:0 > was collected before with the same name and label values
As you can see there is two servers with the same name.
I guess on of nomad's agent were losted and then executed new one.
~ $ nomad node-status
ID DC Name Class Drain Status
13c89393 staging app1.test.local <none> false ready
bcb94e93 staging app1.test.local <none> false down
98f7b583 staging app3.test.local <none> false ready
869ba8a7 staging app7.test.local <none> false ready
a5bac338 staging app9.test.local <none> false ready
f5cc2390 staging app5.test.local <none> false ready
28ed1f83 staging app13.test.local <none> false ready
0e71ac4e staging app11.test.local <none> false ready
Didn't you think about adding label node_id?