DAQC2plate - ADC Nodes lockup if external python script to toggle DOUT pins is executed
Opened this issue · 3 comments
This may not be an issue with node-red-contrib-pi-plates, however...
Attached Example Flow merely reads two ADC values. Import to Node-Red, configure the Pi-Plate, and Deploy.
Validate ADC nodes are getting values.
Log into a shell on the Pi running Node-Red.
Create wigglepin.py from attachment:
import piplates.DAQC2plate as DAQC2
import RPi.GPIO as GPIO
import time
try:
while 1:
DAQC2.setDOUTbit(0,5)
time.sleep(.9)
DAQC2.clrDOUTbit(0,5)
time.sleep(.1)
except KeyboardInterrupt:
DAQC2.clrDOUTbit(0,5)
GPIO.cleanup()
Execute it
python3 wigglepin.py
The DOUT 5 pin should start toggling, or sometimes you get the callstack listed below.
Notice the ADC values stop and the Pi-Plate is no longer accessible from Node-Red.
Restart the Flow. Nope. Still dead.
Restart Pi to reconnect to the Pi-Plate.
Wash-Rinse-Repeat
Error Callstack
Traceback (most recent call last):
File "wigglepin.py", line 1, in <module>
import piplates.DAQC2plate as DAQC2
File "/usr/local/lib/python3.7/dist-packages/piplates/DAQC2plate.py", line 748, in <module>
quietPoll()
File "/usr/local/lib/python3.7/dist-packages/piplates/DAQC2plate.py", line 713, in quietPoll
getCalVals(i)
File "/usr/local/lib/python3.7/dist-packages/piplates/DAQC2plate.py", line 727, in getCalVals
values[j]=CalGetByte(addr,6*i+j)
File "/usr/local/lib/python3.7/dist-packages/piplates/DAQC2plate.py", line 586, in CalGetByte
return resp[0]
IndexError: list index out of range
There should really only be one process talking to the Pi Plates at a time. node-pi-plates (which is used by node-red-contrib-pi-plates under the hood) spawns a python co-process (plate_io.py) that imports the python pi-plates module and makes calls to the pi-plates api.
In order to allow multiple processes to share the pi plates, we'd need a different architecture where the process talking to the pi-plates exposed some API where multiple consumers could connect and make requests (e.g. pigpiod)
So, what's happening when the plate(s) appear to stop responding from the Node-RED interface is a crash of the underlying python process that talks to the plates on behalf of the Node-RED pi-plates nodes. This is probably triggered by a reset of the pi plates microcontroller which means a period of time where the python API calls fail. The wiggle script can also fail in a similar way, but since it's making calls less frequently, it usually survives longer than the node-pi-plates plate_io.py process. The plates are actually back to being functional within a second or two, but the node-pi-plates python co-process won't be re-spawned until Node-RED itself is restarted (e.g. systemctl restart nodered).
So, first we should be providing more helpful error messages when our python co-process crashes: Harsch-Systems/node-pi-plates#10
Secondly, we should really survive such crashes by re-spawning the co-process automatically (or, better yet, offer a configuration option to respawn upon python process crash). Harsch-Systems/node-pi-plates#11
We should probably keep a counter of how many times we've restarted and mention that in the error message each time, so the user can easily detect these kind of 'dueling processes' failure scenarios.
The Pi-Plates microcontroller does not reset unless explicitly told to do so. What can happen (in our later products) is that the processor will "give up" on a data exchange with the RPi if there is no response within 50msec and reset the I/O process. This may lead to erroneous data being received by the RPi and/or a loss of synchronization. Our older products (DAQC, MOTOR, and RELAY) use a simpler protocol and require lots of undesireable delays to maintain synchronization.