colloqi/pisignage-server

Server crashing/restarting when PI player powers off unexpectedly

BigsaveCory opened this issue · 4 comments

NodeJS version: 10.21.0
PM2 version: 3.5
piSignage server version: 2.5.4
piPlayer version: 2.8.2
piPlayer hardware: RPI3 and RPI4

We have noticed the piSignage opensource server keeps restarting almost every day.

We have figured out it happens when a raspberry pi player is powered off unexpectedly (for example pulling the power cable out or a power cut).

A log is output about a heartbeat timeout for that player. We also log out the player object so we know which one it is.

[Sun Jun 14 2020 21:33:19] [LOG]    disconnect: undefined-undefined;reason: heartbeat timeout
[Sun Jun 14 2020 21:33:19] [LOG]    {"_id":"5d2d1e9bb13f185d8b6fedd9","isConnected":false,"lastReported":"2020-06-14T09:28:40.474Z","ip":"::ffff:10.6.83.26","socket":"sIS3kPNoYKB8d2KfFLw5","version":"2.8.2","platform_version":"stretch_9.9_admin_2019-05-07","myIpAddress":"10.6.83.26 ","playlistOn":true,"tvStatus":true,"__v":1,"TZ":"Pacific/Auckland","syncInProgress":false,"cecTvStatus":true,"licensed":true,"createdAt":"2019-07-16T00:47:23.868Z","serverServiceDisabled":false,"registered":true,"newSocketIo":false}}

Then 15 minutes later this error is output in the logs (Its always roughly 15 minutes after the heartbeat timeout.

[Sun Jun 14 2020 21:48:34] [ERROR]  { Error: read ETIMEDOUT at TCP.onStreamRead (internal/stream_base_commons.js:111:27) errno: 'ETIMEDOUT', code: 'ETIMEDOUT', syscall: 'read' }

Then after that you can see the server startup logs

[Sun Jun 14 2020 21:48:35] [LOG]    ********************************************************************
[Sun Jun 14 2020 21:48:35] [LOG]    *    After update if you do not see your groups, please change     *
[Sun Jun 14 2020 21:48:35] [LOG]    *    change the uri variable to "mongodb://localhost/pisignage-dev"*
[Sun Jun 14 2020 21:48:35] [LOG]    *    in config/env/development.js and restart the server           *
[Sun Jun 14 2020 21:48:35] [LOG]    ******************************************************************

[Sun Jun 14 2020 21:48:35] [LOG]    info: socket.io started
[Sun Jun 14 2020 21:48:35] [LOG]    Express server listening on port 3000 in development mode
[Sun Jun 14 2020 21:48:35] [LOG]    Reset isConnected for 32 players

We can reproduce the above 100% of the time by pulling the power cable out of a PI player. Can this be looked into and fixed or point us in the right direction and we'll do a PR. The server shouldn't be crashing just because a single player loses power and this will happen more frequently as we add more players.

As of now, best solution could be to add a startup service under the OS to start the server again when the crash happens. One example is available at https://github.com/colloqi/pisignage-server/blob/master/Init%20service%20example%20for%20systemd.md

That will not fix the problem, I think you may have misunderstood what I have said. The server operating system itself is not crashing. It is the piSignage server software that is crashing. PM2 process manager is already restarting the software after it crashes but this is not a fix to the problem. The software should not crash in the first place. The details on how to reproduce the crash are in my original post.

It is a known issue and we are yet to fix

Solution
Looks like the issue is specifically with players connecting to the piSignage server using the old version of socket io (0.9.19). Once we forced all the players to connect with the later version 2.1.1 the issue stopped happening. Followed this article to force the player to connect with the later version https://help.pisignage.com/hc/en-us/articles/360020538732-New-version-of-socket-io-is-not-working-with-open-source-server-address