collector crash prevents other collectors from running

Hi,
When one of the collector crashes it returns directly from the collector for loop. Wouldn't be better to log the error and continue with the remaining collectors ?
I can provide a PR for this.

Nacef

I also have experienced this and have a 'quick and dirty' hack to fix this, however i think it's something worth discussing before anything is commited.

The main collector functions, will return the error as soon as the collector has failed

mikrotik-exporter/collector/collector.go

Lines 235 to 237 in 5cccba9

    
           if err != nil { 
        
           	return err 
        
           }

Moving the return to outside of the loop, changing this to a continue and returning the errors as a slice of errors would work.

However this next hurdle is what we run into. Do you mark the overal scrape as successfull or failed.

mikrotik-exporter/collector/collector.go

Lines 209 to 218 in 5cccba9

    
           if err != nil { 
        
           	log.Errorf("ERROR: %s collector failed after %fs: %s", d.Name, duration.Seconds(), err) 
        
           	success = 0 
        
           } else { 
        
           	log.Debugf("OK: %s collector succeeded after %fs.", d.Name, duration.Seconds()) 
        
           	success = 1 
        
           } 
        
           ch <- prometheus.MustNewConstMetric(scrapeDurationDesc, prometheus.GaugeValue, duration.Seconds(), d.Name) 
        
           ch <- prometheus.MustNewConstMetric(scrapeSuccessDesc, prometheus.GaugeValue, success, d.Name)

My proposal here would be either:

split out success into each collector type. This then allows other scrapes to be identified as successful or not, and then the user can use this to throw an alert or whatever they would like to do for that collector
Split the collector metrics into /metrics/collectorName and then have a HTTP 500 returned when they are failing.

Both are discussed in the promethus docs.

My personal preference would be the first, but before anyone goes and writes a solution I thought it might be usefull to have the discussion and for @nshttpd to provide their opinion.

Hello guys!
I have the same issue and fixes with

mikrotik-exporter/collector/collector.go

Lines 235 to 237 in 5cccba9

    
           if err != nil { 
        
           	return err 
        
           }

mikrotik-exporter/collector/collector.go

Lines 209 to 218 in 5cccba9

    
           if err != nil { 
        
           	log.Errorf("ERROR: %s collector failed after %fs: %s", d.Name, duration.Seconds(), err) 
        
           	success = 0 
        
           } else { 
        
           	log.Debugf("OK: %s collector succeeded after %fs.", d.Name, duration.Seconds()) 
        
           	success = 1 
        
           } 
        
           ch <- prometheus.MustNewConstMetric(scrapeDurationDesc, prometheus.GaugeValue, duration.Seconds(), d.Name) 
        
           ch <- prometheus.MustNewConstMetric(scrapeSuccessDesc, prometheus.GaugeValue, success, d.Name)

did not help me(

These are not fixes, it's a disucssion on how this could be resolved. I've not touched the code for a while, so I'd have to look back at it on how to do the fix, but based off my comments it would mean that the success=0 would need to be set to 1 for the IPV6 collector, or the code for IPV6 to be commented out / controllable via a flag.

I changed success=0 to success=1 and this did not help. Have the same errors.

mikrotik-exporter/collector/collector.go

Lines 209 to 218 in 5cccba9

    
           if err != nil { 
        
           	log.Errorf("ERROR: %s collector failed after %fs: %s", d.Name, duration.Seconds(), err) 
        
           	success = 0 
        
           } else { 
        
           	log.Debugf("OK: %s collector succeeded after %fs.", d.Name, duration.Seconds()) 
        
           	success = 1 
        
           } 
        
           ch <- prometheus.MustNewConstMetric(scrapeDurationDesc, prometheus.GaugeValue, duration.Seconds(), d.Name) 
        
           ch <- prometheus.MustNewConstMetric(scrapeSuccessDesc, prometheus.GaugeValue, success, d.Name)

Still affects almost all of my devices since I have the same collectors on for everything

	if err != nil {
	log.Errorf("ERROR: %s collector failed after %fs: %s", d.Name, duration.Seconds(), err)
	success = 0
	} else {
	log.Debugf("OK: %s collector succeeded after %fs.", d.Name, duration.Seconds())
	success = 1
	}

	ch <- prometheus.MustNewConstMetric(scrapeDurationDesc, prometheus.GaugeValue, duration.Seconds(), d.Name)
	ch <- prometheus.MustNewConstMetric(scrapeSuccessDesc, prometheus.GaugeValue, success, d.Name)