opendata-stuttgart/sensors-software

Device hangs every 2 to 3 days

UweKeim opened this issue ยท 17 comments

Having 3 separate HW sensors in different location running since approx. 4 months successfully.

Since approx. 1 week, one of the sensors keeps stalling every 2 to 3 days.

I.e.:

  • Not sending data anymore to the connected services.
  • Not being able to be accessed via browser.

The device spec is:

ID: 3588999 (4c752536c387)
Firmware version: NRZ-2020-133/EN (Nov 29 2020)

When shortly disconnecting the device from power and connecting again, everything works perfectly again for a few days.

Being a software developer and mediocre talented administrator myself, I found no way of further narrowing this down. The "Debug level" setting seems to only affect the current system state, and in addition I found no way of getting logfiles from the device.

It seems that when it hangs, it has absurdly high values for PM10 (like e.g. 130+) and PM2.5 (like e.g. 50+) so I could imagine that it crashes when the values coming from the sensors are too high. But after all this is just a wild guess.

Question

Any hints on how to further investigate and finally solve the issue?

It looks like a serial transmission issue with a byte shifting. It is the reason why some values are huge. Check the connections. It happens to me also with a custom sensor with a Next PM

Thank you, Pierre-Jean.

What does "Check the connections" mean in detail? Should I unplug and plug-in again all HW connectors between the sensors and boards? Or should I rather dump the whole board and buy a new one?

Check if the Dupont cables are tight enough. Use contact fluid as well. Check also oxydation traces.

Small status update:

I've unplugged and re-plugged all on-board connectors.

As of now, the re-plugging seems to be sufficient as there were no more outages since then.

On the other hand the weather is now way warmer than it was the last weeks, so maybe this is just a coincidence.

I've also ordered "Kontakt 60" (not yet arrived yet). Will apply this, too.

Update one week later (2023-01-10)

Still no outages since last weeks re-plugging. No Kontakt 60 applied yet.

Update 2023-01-12

Had an outage today. Unplugged all connectors, applied Kontakt 60 and reconnected.

Update 2023-01-15

Again an outage. I'm buying a new kit and will replace the whole system.

Update 2023-03-05

I finally replaced the whole sensor kit system with a new one. Hopefully this one runs more stable new.

Well I have the same issue with my sensor, same issue as the OP, already checked the cables and wiring everything looks ok, no signs of corrosion

Sensor Id: 2475238

Having ordered a new sensor, but still not replaced the old one, I did another workaround:

I ordered a switchable wall socket (this one), connected the power USB adapter to this wall socket, and use my iPhone and the wall socket app from TP-Link to quickly turn off and on whenever the sensor goes dead.

Yesterday I had to use it approx. 10 times, today not a single time and this happens to appear randomly without any obvious pattern. Had several week no outage at all.

To me that very sensor seems to be a faulty one and I simply will replace it in the near future.

(The other two similar sensors I'm using do not show any of this erroneous behavior).

@rubencosta13: same problem at my sensor

Yeah I think that they should implement a reboot time option (Cron job) like every day at x time... It would actually fix 90% of the issues

@UweKeim that's really odd 10 times is not too good

The values I'm getting also seem to be odd, sometimes very low, sometimes extremely high.

I used another hand meter to compare the values and they did not match by a large factor. So I think my sensor/board is totally nuts and has to be replaced.

I'm just too lazy right now ๐Ÿ™‚.

And the WiFi ? Is the Range good enough? It is easy to modify the firmware to make it restart every 2 days. It is at the very end of the code.

The WiFi signal is awesome. The sensor worked for quite a while, so it is definitely not a connection issue.

I am using multiple high quality Unifi WLAN access points throughout my house, each connected to LAN. This is a rock-solid setup that is running successfully for years.

Hi, my Access Point is 50cm away from the Sensor Device

UPDATE:
Now everyday, the sensor refuses to connect... I have no clue what is going on, I really think that implementing a restarting system would definitely help

Screen Shot 2023-03-07 at 22 28 53

For now in defines.h:
Screen Shot 2023-03-07 at 22 30 37

The sensor has to restart every 28 days because the millis() reach their max (unsigned long on 4 bytes if I remember well).
You may change the variable as you wish.

And deactivate the WPS on your router! And take care of the interferences between your normal network and the guest network and between 2,4 Ghz and 5 GHz. Believe me or not, I have seen a lot of thing in the past 7 years...

The measurement values of my new replacement sensor are much more in line with the measurement values of an external handheld device I'm using from time to time.

The old sensor that I replaced hat absurdly high values from time to time, that my handheld device did not confirm.