o3ma/o3

Crash

Closed this issue · 7 comments

Sec42 commented

Leaving a bot running for some time, I got the following crash:

panic: not enough bytes were transmitted

goroutine 22 [running]:
panic(0x66b820, 0xc420172d30)
        /usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/o3ma/o3.writeHelper(0x7ece40, 0xc420084048, 0xc4201775e0)
        /home/sec/Projects/gopath/src/github.com/o3ma/o3/packetdispatcher.go:30 +0x11e
github.com/o3ma/o3.(*SessionContext).dispatchEchoMsg(0x80b5c0, 0x7ece40, 0xc420084048, 0x80, 0x118)
        /home/sec/Projects/gopath/src/github.com/o3ma/o3/packetdispatcher.go:119 +0x309
github.com/o3ma/o3.(*SessionContext).sendLoop(0x80b5c0)
        /home/sec/Projects/gopath/src/github.com/o3ma/o3/communicationhandler.go:131 +0x231
created by github.com/o3ma/o3.(*SessionContext).receiveLoop
        /home/sec/Projects/gopath/src/github.com/o3ma/o3/communicationhandler.go:103 +0x54e
exit status 2
V-inz commented

hi Sec, did you use the latest version? communicationhandler.go:131 seems changed/old for a call of dispatchEchoMsg().

Leaving a bot running for some time.

What time are you talking about? Mine is up several days without any problem.
It's the regular EchoMsg, so your RX/TX-load should not matter.
Was there a network-loss during send, hard to say ...

Meanwhile you could start your bot something like this. Makes it staying up - and perhaps gives some further info.

#!/bin/bash
while :
do
	echo "Start: "$(date -u +"%Y-%m-%dT%H:%M:%SZ") | tee -a log.txt
	./yourfriendlyrobotscript 2>&1 | tee -a log.txt
	echo "End: "$(date -u +"%Y-%m-%dT%H:%M:%SZ") | tee -a log.txt
	sleep 180 # time to breathe
done

@willnix I think, sending errors for regular messages should be handled by the user.
But errors in EchoMsg should be silently discard (or counted in background, or should change the "I am currently online"-state).

This may have been a short local network interruption or a longer drop of the uplink (your public ip changed,...). So EchoMsg has to handle this (more silently :).

V-inz commented

ok, can confirm this after plunging the local network and waiting, line numbers of the current testing:

> Error Receiving Message: read tcp 192.168.1.11:42820->5.148.175.198:5222: read: no route to host
panic: not enough bytes were transmitted

goroutine 34 [running]:
panic(0x280170, 0x1070c218)
        /usr/local/go/src/runtime/panic.go:500 +0x33c
github.com/o3ma/o3.writeHelper(0x399bc0, 0x1070c5a8, 0x10782360)
        /home/v/go/src/github.com/o3ma/o3/packetdispatcher.go:30 +0x100
github.com/o3ma/o3.(*SessionContext).dispatchEchoMsg(0x10784000, 0x399bc0, 0x1070c5a8, 0x80, 0x0, 0x0)
        /home/v/go/src/github.com/o3ma/o3/packetdispatcher.go:119 +0x754
github.com/o3ma/o3.(*SessionContext).sendLoop(0x10784000)
        /home/v/go/src/github.com/o3ma/o3/communicationhandler.go:162 +0x1f0
created by github.com/o3ma/o3.(*SessionContext).receiveLoop
        /home/v/go/src/github.com/o3ma/o3/communicationhandler.go:134 +0x498

Another message, same stack:

> panic: write tcp 192.168.1.11:42824->5.148.175.198:5222: use of closed network connection

Well this looks like the expected result of a network failure.
I agree however, that a short network failure should be handled gracefully.
I don't think this would have happened if the connection was alive, because TCP should handle minor packet losses.
But if the connection was closed we have to reconnect and do the handshake again before we can attempt to send the packet again.
I have a redesign of the connection handling planned anyway. That should allow for easy connection renewal...

V-inz commented

well, I'm not a go programmer, but I think the panic() calls are hard to handle from a API-users perspective. Especially the unexpected ones from automatic EchoMsg().

The convention in the Go libraries is that even when a package uses panic internally, its external API still presents explicit error return values.
https://blog.golang.org/defer-panic-and-recover

Neither am I. ;)
But this is definitely what we want to achieve so any unhanded panics are either oversights or todos on our list...

V-inz commented

:) don't believe that!

( So, on-topic, to achieve this, you should use my link above :) and listen this friendly guy at http://www.se-radio.net/2014/03/episode-202-andrew-gerrand/ )

I know how the panic, defer and recover stunt works. The JSON parser mentioned in your link actually inspired the design of our packet parser. :)
See https://github.com/o3ma/o3/blob/master/communicationhandler.go#L265 for an example.
sendLoop() needs something like this aswell.
The problem is that until I have redesigned the whole connection handling, there is just no way to handle the panic properly.
So I will address this issue after the redesign; which will probably be some time next week.