UniversalDevicesInc/Polyglot

Rebooting ISY disconnects all of polyglot

Closed this issue · 5 comments

I've noticed though various testing that if you reboot the ISY, polyglot doesn't update the nodes correctly once it comes back until you restart polyglot. Restarting the node_servers themselves don't seem to help. I will have to look into why this is happening.

Thanks James.

With kind regards,


Michel Kohanim
CEO

(p) 818.631.0333
(f) 818.436.0702
http://www.universal-devices.comhttp://www.universal-devices.com/


From: James [mailto:notifications@github.com]
Sent: Friday, April 1, 2016 8:23 AM
To: UniversalDevicesInc/Polyglot Polyglot@noreply.github.com
Subject: [UniversalDevicesInc/Polyglot] Rebooting ISY disconnects all of polyglot (#29)

I've noticed though various testing that if you reboot the ISY, polyglot doesn't update the nodes correctly once it comes back until you restart polyglot. Restarting the node_servers themselves don't seem to help. I will have to look into why this is happening.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHubhttps://github.com//issues/29

I'm not sure where to necessarily even start on this one... Thoughts?

Was this observed before or after the implementation of the request queue for comms to the ISY?

Either way, I'd suggest a full polyglot log at DEBUG level would be helpful with the current version, plus a lot of patience. My guess, if this still happens, is related to issue #56 -- things can get pretty piled up in the queues with a default of (I think) 5 minutes per request.

As part of the project to add the node probes (first part of which is implemented), I've also been considering adding a mechanism to determine ISY connection health so that we can short-circuit the entire queue issue in the first place. In a nutshell, the idea would be that the lowest level code that sends node server API REST calls to the ISY would have a global that records the last several attempts to contact the ISY. If we have repeated timeouts, this code would fall back to an algorithm that does a simple "ping" type of request to the ISY with a short timeout, and drop the request with a failure indication back to the calling node server. When it enters this state, this "ISY offline" state would be signaled to the node servers so they could avoid calling in the first place.

Kind of hack-ey. Actually, very hack-ey. Hence issue #56 -- if the timeouts were set to some low value to begin with (15 seconds, for example), then piling up of requests on queues and the associated long recovery times becomes less of an issue to begin with.

I'm travelling again, so not in a position to do a lot of experimentation until next week.

This is probably the right answer... I'll try to get some time this week and mess with it.

Moved to #62