elastic/Machinebeat

segfault with 7.6.2

Closed this issue · 12 comments

I got a segfault after ~40 minutes with the new version

2020-04-14T08:06:51.082Z        INFO    nodevalue/nodevalue.go:185      [OPCUA] Publishing 16 new events
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x304b4af]

goroutine 9584 [running]:
github.com/elastic/machinebeat/module/opcua/nodevalue.collectData(0x3872978, 0x9, 0x38c0a46, 0x20, 0x0)
        /root/go_projects/src/github.com/elastic/machinebeat/module/opcua/nodevalue/client.go:116 +0x31f
github.com/elastic/machinebeat/module/opcua/nodevalue.collect(0xc0001cc700, 0x7f54a64c8ce0, 0xc0004a7c50, 0x23c7, 0xc0002679d0)
        /root/go_projects/src/github.com/elastic/machinebeat/module/opcua/nodevalue/nodevalue.go:160 +0x65
github.com/elastic/machinebeat/module/opcua/nodevalue.(*MetricSet).Fetch.func1(0xc0001cc700, 0x7f54a64c8ce0, 0xc0004a7c50)
        /root/go_projects/src/github.com/elastic/machinebeat/module/opcua/nodevalue/nodevalue.go:235 +0x3f
created by github.com/elastic/machinebeat/module/opcua/nodevalue.(*MetricSet).Fetch
        /root/go_projects/src/github.com/elastic/machinebeat/module/opcua/nodevalue/nodevalue.go:234 +0x231

During the last update I missed to handle one error. This was leading to that error and should be fixed now.

I still got several segfaults with 7.6.3. I'll collect some debug logs next time it happens.

2020-04-17T08:25:04.802Z        DEBUG   [Collect]       nodevalue/client.go:105 Building the request
2020-04-17T08:25:04.802Z        DEBUG   [Collect]       nodevalue/client.go:107 Collecting data from Node ns=1;i=173648
2020-04-17T08:25:07.802Z        DEBUG   [Collector]     nodevalue/nodevalue.go:158      Event collector instance started
2020-04-17T08:25:07.802Z        DEBUG   [Collect]       nodevalue/client.go:105 Building the request
2020-04-17T08:25:07.802Z        DEBUG   [Collect]       nodevalue/client.go:107 Collecting data from Node ns=1;i=173648
panic: runtime error: index out of range

goroutine 47435 [running]:
github.com/elastic/machinebeat/module/opcua/nodevalue.collectData(0x25f2acb, 0x9, 0x26408ea, 0x20, 0x0)
        C:/Users/felix/projekte/machinebeat/src/github.com/elastic/machinebeat/module/opcua/nodevalue/client.go:124 +0xc08
github.com/elastic/machinebeat/module/opcua/nodevalue.collect(0xc00014da40, 0x7fa430d639b0, 0xc000536000, 0x4601, 0xc0001f0460)
        C:/Users/felix/projekte/machinebeat/src/github.com/elastic/machinebeat/module/opcua/nodevalue/nodevalue.go:160 +0x65
github.com/elastic/machinebeat/module/opcua/nodevalue.(*MetricSet).Fetch.func1(0xc00014da40, 0x7fa430d639b0, 0xc000536000)
        C:/Users/felix/projekte/machinebeat/src/github.com/elastic/machinebeat/module/opcua/nodevalue/nodevalue.go:235 +0x3f
created by github.com/elastic/machinebeat/module/opcua/nodevalue.(*MetricSet).Fetch
        C:/Users/felix/projekte/machinebeat/src/github.com/elastic/machinebeat/module/opcua/nodevalue/nodevalue.go:234 +0x231

You can give it another try :)

Running stable for 15 hours at the moment

Got a segfault after 36 hours

2020-04-18T21:00:03.597Z        INFO    nodevalue/nodevalue.go:185      [OPCUA] Publishing 15 new events
2020-04-18T21:00:19.218Z        INFO    [monitoring]    log/log.go:145  Non-zero metrics in the last 30s        {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":162970,"time":{"ms":32}},"total":{"ticks":706610,"time":{"ms":94},"value":706610},"user":{"ticks":543640,"time":{"ms":62}}},"handles":{"limit":{"hard":1048576,"soft":1024},"open":8},"info":{"ephemeral_id":"c2b43f19-299d-4c57-aa4d-ab5b12522343","uptime":{"ms":111540040}},"memstats":{"gc_next":6626544,"memory_alloc":8186632,"memory_total":142671300672},"runtime":{"goroutines":31}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":90,"batches":6,"total":90},"read":{"bytes":210},"write":{"bytes":11086}},"pipeline":{"clients":1,"events":{"active":0,"published":75,"total":75},"queue":{"acked":90}}},"metricbeat":{"opcua":{"nodevalue":{"events":75,"success":75}}},"system":{"load":{"1":0.08,"15":0.01,"5":0.04,"norm":{"1":0.04,"15":0.005,"5":0.02}}}}}}
2020-04-18T21:00:49.215Z        INFO    [monitoring]    log/log.go:145  Non-zero metrics in the last 30s        {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":162980,"time":{"ms":13}},"total":{"ticks":706620,"time":{"ms":17},"value":706620},"user":{"ticks":543640,"time":{"ms":4}}},"handles":{"limit":{"hard":1048576,"soft":1024},"open":8},"info":{"ephemeral_id":"c2b43f19-299d-4c57-aa4d-ab5b12522343","uptime":{"ms":111570040}},"memstats":{"gc_next":6626544,"memory_alloc":3548032,"memory_total":142671537704},"runtime":{"goroutines":41}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"system":{"load":{"1":0.05,"15":0,"5":0.03,"norm":{"1":0.025,"15":0,"5":0.015}}}}}}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1e5e349]

goroutine 1189561 [running]:
github.com/gopcua/opcua/uasc.(*SecureChannel).sendAsyncWithTimeout(0x0, 0x2a5b400, 0xc0003805d0, 0xc000365500, 0x1, 0x2540be400, 0x0, 0x0, 0x0, 0x0)
        C:/Users/felix/projekte/machinebeat/src/github.com/gopcua/opcua/uasc/secure_channel.go:194 +0x59
github.com/gopcua/opcua/uasc.(*SecureChannel).SendRequestWithTimeout(0x0, 0x2a5b400, 0xc0003805d0, 0xc000365500, 0x2540be400, 0xc00038cf00, 0x0, 0x0)
        C:/Users/felix/projekte/machinebeat/src/github.com/gopcua/opcua/uasc/secure_channel.go:161 +0x92
github.com/gopcua/opcua.(*Client).sendWithTimeout(0xc000401380, 0x2a5b400, 0xc0003805d0, 0x2540be400, 0xc00038cf00, 0x0, 0xc0008a8000)
        C:/Users/felix/projekte/machinebeat/src/github.com/gopcua/opcua/client.go:466 +0xb1
github.com/gopcua/opcua.(*Client).Send(...)
        C:/Users/felix/projekte/machinebeat/src/github.com/gopcua/opcua/client.go:455
github.com/gopcua/opcua.(*Client).Read(0xc000401380, 0xc000380570, 0xc000077c00, 0x7fbc979c16d0, 0x0)
        C:/Users/felix/projekte/machinebeat/src/github.com/gopcua/opcua/client.go:512 +0x27f
github.com/gopcua/opcua.(*Node).Attribute(0xc00038cef0, 0x4, 0x231f940, 0xc00038ce01, 0xc00038cef0)
        C:/Users/felix/projekte/machinebeat/src/github.com/gopcua/opcua/node.go:114 +0xdb
github.com/gopcua/opcua.(*Node).DisplayName(0xc00038cef0, 0xc00038cef0, 0x1, 0x1)
        C:/Users/felix/projekte/machinebeat/src/github.com/gopcua/opcua/node.go:58 +0x37
github.com/elastic/machinebeat/module/opcua/nodevalue.collectData(0x25f3ad2, 0x9, 0x26418f1, 0x20, 0x0)
        C:/Users/felix/projekte/machinebeat/src/github.com/elastic/machinebeat/module/opcua/nodevalue/client.go:116 +0x31a
github.com/elastic/machinebeat/module/opcua/nodevalue.collect(0xc00014b500, 0x7fbc955bb720, 0xc000381830, 0x0, 0x0)
        C:/Users/felix/projekte/machinebeat/src/github.com/elastic/machinebeat/module/opcua/nodevalue/nodevalue.go:160 +0x65
github.com/elastic/machinebeat/module/opcua/nodevalue.(*MetricSet).Fetch.func1(0xc00014b500, 0x7fbc955bb720, 0xc000381830)
        C:/Users/felix/projekte/machinebeat/src/github.com/elastic/machinebeat/module/opcua/nodevalue/nodevalue.go:237 +0x3f
created by github.com/elastic/machinebeat/module/opcua/nodevalue.(*MetricSet).Fetch
        C:/Users/felix/projekte/machinebeat/src/github.com/elastic/machinebeat/module/opcua/nodevalue/nodevalue.go:236 +0x231

This is an error in the underlaying lib. But I will try to prevent the beat from crashing when this happens.

I got this working. So the beat can recover from any panic now. If that happens the beat will reconnect.

Got this with 7.6.5:

2020-04-20T15:36:49.090Z        INFO    nodevalue/nodevalue.go:162      Recovered from panic. The beat will reconnect now
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x1e5de69]

goroutine 101767 [running]:
github.com/gopcua/opcua/uasc.(*SecureChannel).setState(0x0, 0x2)
        C:/Users/felix/projekte/machinebeat/src/github.com/gopcua/opcua/uasc/secure_channel.go:145 +0x9
panic(0x21959e0, 0x4697940)
        c:/go/src/runtime/panic.go:522 +0x1b5
github.com/gopcua/opcua/uasc.(*SecureChannel).hasState(...)
        C:/Users/felix/projekte/machinebeat/src/github.com/gopcua/opcua/uasc/secure_channel.go:149
github.com/gopcua/opcua/uasc.(*SecureChannel).closeSecureChannel(0x0, 0x0, 0x0)
        C:/Users/felix/projekte/machinebeat/src/github.com/gopcua/opcua/uasc/secure_channel.go:744 +0x7b
github.com/gopcua/opcua/uasc.(*SecureChannel).Close(0x0, 0x0, 0x0)
        C:/Users/felix/projekte/machinebeat/src/github.com/gopcua/opcua/uasc/secure_channel.go:665 +0x32
github.com/gopcua/opcua.(*Client).Close(0xc0001b6f00, 0x31, 0x0)
        C:/Users/felix/projekte/machinebeat/src/github.com/gopcua/opcua/client.go:234 +0x46
github.com/elastic/machinebeat/module/opcua/nodevalue.closeConnection()
        C:/Users/felix/projekte/machinebeat/src/github.com/elastic/machinebeat/module/opcua/nodevalue/client.go:421 +0x2d
github.com/elastic/machinebeat/module/opcua/nodevalue.collect.func1()
        C:/Users/felix/projekte/machinebeat/src/github.com/elastic/machinebeat/module/opcua/nodevalue/nodevalue.go:163 +0x6c
panic(0x21959e0, 0x4697940)
        c:/go/src/runtime/panic.go:522 +0x1b5
github.com/gopcua/opcua/uasc.(*SecureChannel).sendAsyncWithTimeout(0x0, 0x2a5b440, 0xc000360a80, 0x0, 0x1, 0x2540be400, 0x0, 0x0, 0x0, 0x0)
        C:/Users/felix/projekte/machinebeat/src/github.com/gopcua/opcua/uasc/secure_channel.go:194 +0x59
github.com/gopcua/opcua/uasc.(*SecureChannel).SendRequestWithTimeout(0x0, 0x2a5b440, 0xc000360a80, 0x0, 0x2540be400, 0xc00044ae20, 0x0, 0x0)
        C:/Users/felix/projekte/machinebeat/src/github.com/gopcua/opcua/uasc/secure_channel.go:161 +0x92
github.com/gopcua/opcua.(*Client).sendWithTimeout(0xc0001b6f00, 0x2a5b440, 0xc000360a80, 0x2540be400, 0xc00044ae20, 0x0, 0xc0002be000)
        C:/Users/felix/projekte/machinebeat/src/github.com/gopcua/opcua/client.go:466 +0xb1
github.com/gopcua/opcua.(*Client).Send(...)
        C:/Users/felix/projekte/machinebeat/src/github.com/gopcua/opcua/client.go:455
github.com/gopcua/opcua.(*Client).Read(0xc0001b6f00, 0xc000360a20, 0xc000187180, 0x7f881dd1a6d0, 0x0)
        C:/Users/felix/projekte/machinebeat/src/github.com/gopcua/opcua/client.go:512 +0x27f
github.com/gopcua/opcua.(*Node).Attribute(0xc00044ae10, 0x4, 0x231f940, 0xc000299401, 0xc00044ae10)
        C:/Users/felix/projekte/machinebeat/src/github.com/gopcua/opcua/node.go:114 +0xdb
github.com/gopcua/opcua.(*Node).DisplayName(0xc00044ae10, 0xc00044ae10, 0x2, 0x2)
        C:/Users/felix/projekte/machinebeat/src/github.com/gopcua/opcua/node.go:58 +0x37
github.com/elastic/machinebeat/module/opcua/nodevalue.collectData(0x0, 0x2725d88, 0x26418f1, 0x20, 0x0)
        C:/Users/felix/projekte/machinebeat/src/github.com/elastic/machinebeat/module/opcua/nodevalue/client.go:116 +0x31a
github.com/elastic/machinebeat/module/opcua/nodevalue.collect(0xc00014ba40, 0x7f881bac12d8, 0xc0004be360, 0x0, 0x0)
        C:/Users/felix/projekte/machinebeat/src/github.com/elastic/machinebeat/module/opcua/nodevalue/nodevalue.go:167 +0x8d
github.com/elastic/machinebeat/module/opcua/nodevalue.(*MetricSet).Fetch.func1(0xc00014ba40, 0x7f881bac12d8, 0xc0004be360)
        C:/Users/felix/projekte/machinebeat/src/github.com/elastic/machinebeat/module/opcua/nodevalue/nodevalue.go:243 +0x3f
created by github.com/elastic/machinebeat/module/opcua/nodevalue.(*MetricSet).Fetch
        C:/Users/felix/projekte/machinebeat/src/github.com/elastic/machinebeat/module/opcua/nodevalue/nodevalue.go:242 +0x231

This happens when the connection is already closed due to the panic. I now added a check for that. So you should not see this again.

Can we close this one?

Yes, it runs fine now.