Harsch-Systems/node-red-contrib-pi-plates

DAQC2plate - ADC Nodes lockup if external python script to toggle DOUT pins is executed

Opened this issue · 3 comments

KD4Z commented

This may not be an issue with node-red-contrib-pi-plates, however...

Attached Example Flow merely reads two ADC values. Import to Node-Red, configure the Pi-Plate, and Deploy.
Validate ADC nodes are getting values.

Log into a shell on the Pi running Node-Red.
Create wigglepin.py from attachment:

import piplates.DAQC2plate as DAQC2
import RPi.GPIO as GPIO
import time

try:
        while 1:
                DAQC2.setDOUTbit(0,5)
                time.sleep(.9)
                DAQC2.clrDOUTbit(0,5)
                time.sleep(.1)

except KeyboardInterrupt:
        DAQC2.clrDOUTbit(0,5)
        GPIO.cleanup()

Execute it
python3 wigglepin.py

The DOUT 5 pin should start toggling, or sometimes you get the callstack listed below.
Notice the ADC values stop and the Pi-Plate is no longer accessible from Node-Red.
Restart the Flow. Nope. Still dead.

Restart Pi to reconnect to the Pi-Plate.

Wash-Rinse-Repeat

Error Callstack

Traceback (most recent call last):
 File "wigglepin.py", line 1, in <module>
   import piplates.DAQC2plate as DAQC2
 File "/usr/local/lib/python3.7/dist-packages/piplates/DAQC2plate.py", line 748, in <module>
   quietPoll()
 File "/usr/local/lib/python3.7/dist-packages/piplates/DAQC2plate.py", line 713, in quietPoll
   getCalVals(i)
 File "/usr/local/lib/python3.7/dist-packages/piplates/DAQC2plate.py", line 727, in getCalVals
   values[j]=CalGetByte(addr,6*i+j)
 File "/usr/local/lib/python3.7/dist-packages/piplates/DAQC2plate.py", line 586, in CalGetByte
   return resp[0]
IndexError: list index out of range

ADC_Demo_flow.json.txt
stepsToRepro.txt
wigglepin.py.txt

There should really only be one process talking to the Pi Plates at a time. node-pi-plates (which is used by node-red-contrib-pi-plates under the hood) spawns a python co-process (plate_io.py) that imports the python pi-plates module and makes calls to the pi-plates api.

In order to allow multiple processes to share the pi plates, we'd need a different architecture where the process talking to the pi-plates exposed some API where multiple consumers could connect and make requests (e.g. pigpiod)

So, what's happening when the plate(s) appear to stop responding from the Node-RED interface is a crash of the underlying python process that talks to the plates on behalf of the Node-RED pi-plates nodes. This is probably triggered by a reset of the pi plates microcontroller which means a period of time where the python API calls fail. The wiggle script can also fail in a similar way, but since it's making calls less frequently, it usually survives longer than the node-pi-plates plate_io.py process. The plates are actually back to being functional within a second or two, but the node-pi-plates python co-process won't be re-spawned until Node-RED itself is restarted (e.g. systemctl restart nodered).

So, first we should be providing more helpful error messages when our python co-process crashes: Harsch-Systems/node-pi-plates#10

Secondly, we should really survive such crashes by re-spawning the co-process automatically (or, better yet, offer a configuration option to respawn upon python process crash). Harsch-Systems/node-pi-plates#11

We should probably keep a counter of how many times we've restarted and mention that in the error message each time, so the user can easily detect these kind of 'dueling processes' failure scenarios.

The Pi-Plates microcontroller does not reset unless explicitly told to do so. What can happen (in our later products) is that the processor will "give up" on a data exchange with the RPi if there is no response within 50msec and reset the I/O process. This may lead to erroneous data being received by the RPi and/or a loss of synchronization. Our older products (DAQC, MOTOR, and RELAY) use a simpler protocol and require lots of undesireable delays to maintain synchronization.