DAQC2plate - ADC Nodes lockup if external python script to toggle DOUT pins is executed

Question

DAQC2plate - ADC Nodes lockup if external python script to toggle DOUT pins is executed

Opened this issue 3 years ago · 3 comments

This may not be an issue with node-red-contrib-pi-plates, however...

Attached Example Flow merely reads two ADC values. Import to Node-Red, configure the Pi-Plate, and Deploy.
Validate ADC nodes are getting values.

Log into a shell on the Pi running Node-Red.
Create wigglepin.py from attachment:

import piplates.DAQC2plate as DAQC2
import RPi.GPIO as GPIO
import time

try:
        while 1:
                DAQC2.setDOUTbit(0,5)
                time.sleep(.9)
                DAQC2.clrDOUTbit(0,5)
                time.sleep(.1)

except KeyboardInterrupt:
        DAQC2.clrDOUTbit(0,5)
        GPIO.cleanup()

Execute it
python3 wigglepin.py

The DOUT 5 pin should start toggling, or sometimes you get the callstack listed below.
Notice the ADC values stop and the Pi-Plate is no longer accessible from Node-Red.
Restart the Flow. Nope. Still dead.

Restart Pi to reconnect to the Pi-Plate.

Wash-Rinse-Repeat

Error Callstack

Traceback (most recent call last):
 File "wigglepin.py", line 1, in <module>
   import piplates.DAQC2plate as DAQC2
 File "/usr/local/lib/python3.7/dist-packages/piplates/DAQC2plate.py", line 748, in <module>
   quietPoll()
 File "/usr/local/lib/python3.7/dist-packages/piplates/DAQC2plate.py", line 713, in quietPoll
   getCalVals(i)
 File "/usr/local/lib/python3.7/dist-packages/piplates/DAQC2plate.py", line 727, in getCalVals
   values[j]=CalGetByte(addr,6*i+j)
 File "/usr/local/lib/python3.7/dist-packages/piplates/DAQC2plate.py", line 586, in CalGetByte
   return resp[0]
IndexError: list index out of range

ADC_Demo_flow.json.txt
stepsToRepro.txt
wigglepin.py.txt

Answer 1 · 2021-08-17T19:22:33.000Z

There should really only be one process talking to the Pi Plates at a time. node-pi-plates (which is used by node-red-contrib-pi-plates under the hood) spawns a python co-process (plate_io.py) that imports the python pi-plates module and makes calls to the pi-plates api.

In order to allow multiple processes to share the pi plates, we'd need a different architecture where the process talking to the pi-plates exposed some API where multiple consumers could connect and make requests (e.g. pigpiod)

Answer 2 · 2021-08-20T15:05:14.000Z

So, what's happening when the plate(s) appear to stop responding from the Node-RED interface is a crash of the underlying python process that talks to the plates on behalf of the Node-RED pi-plates nodes. This is probably triggered by a reset of the pi plates microcontroller which means a period of time where the python API calls fail. The wiggle script can also fail in a similar way, but since it's making calls less frequently, it usually survives longer than the node-pi-plates plate_io.py process. The plates are actually back to being functional within a second or two, but the node-pi-plates python co-process won't be re-spawned until Node-RED itself is restarted (e.g. systemctl restart nodered).

So, first we should be providing more helpful error messages when our python co-process crashes: Harsch-Systems/node-pi-plates#10

Secondly, we should really survive such crashes by re-spawning the co-process automatically (or, better yet, offer a configuration option to respawn upon python process crash). Harsch-Systems/node-pi-plates#11

We should probably keep a counter of how many times we've restarted and mention that in the error message each time, so the user can easily detect these kind of 'dueling processes' failure scenarios.

Answer 3 · 2021-09-15T11:04:47.000Z

The Pi-Plates microcontroller does not reset unless explicitly told to do so. What can happen (in our later products) is that the processor will "give up" on a data exchange with the RPi if there is no response within 50msec and reset the I/O process. This may lead to erroneous data being received by the RPi and/or a loss of synchronization. Our older products (DAQC, MOTOR, and RELAY) use a simpler protocol and require lots of undesireable delays to maintain synchronization.