phoddie/node-red-mcu

How much should I be able to get into a Wemos D1 Mini ESP8266

Closed this issue · 9 comments

With an ESP86 Wemos D1 Mini, I find that if I have an Inject, Trigger and GPIO Out node flashing the onboard LED,and an MQTT In followed by three function nodes each containing just return msg, followed by MQTT Out, that after connecting to the wifi, the target crashes. If I remove one of the function nodes then it runs ok.

Am I just hitting the limit of what I can do in this hardware? I am building it using -p nodemcu.

Here is the flow.

[{"id":"f92bb64b93bfc46e","type":"tab","label":"mcu test","disabled":false,"info":"","env":[],"_mcu":{"mcu":true}},{"id":"0a44cdba4c3f6a65","type":"debug","z":"f92bb64b93bfc46e","name":"debug 85","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","_mcu":{"mcu":true},"x":900,"y":100,"wires":[]},{"id":"90e9ccdae3d60741","type":"mqtt out","z":"f92bb64b93bfc46e","name":"","topic":"test/mcu/pid/result","qos":"","retain":"","respTopic":"","contentType":"","userProps":"","correl":"","expiry":"","broker":"75fb29423f8a0770","_mcu":{"mcu":true},"x":610,"y":140,"wires":[]},{"id":"8c687e519a7bd590","type":"mqtt in","z":"f92bb64b93bfc46e","name":"","topic":"test/mcu/pid/set/#","qos":"1","datatype":"json","broker":"75fb29423f8a0770","nl":false,"rap":true,"rh":0,"inputs":0,"_mcu":{"mcu":true},"x":100,"y":80,"wires":[["4ce47a86c7a14eab"]]},{"id":"4ce47a86c7a14eab","type":"function","z":"f92bb64b93bfc46e","name":"function 1","func":"\nreturn msg;","outputs":1,"noerr":0,"initialize":"","finalize":"","libs":[],"_mcu":{"mcu":true},"x":260,"y":80,"wires":[["f4e5d2f56c212e2b"]]},{"id":"ed0d63aad317e829","type":"inject","z":"f92bb64b93bfc46e","name":"","props":[{"p":"payload"}],"repeat":"2","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"1","payloadType":"num","_mcu":{"mcu":true},"x":190,"y":320,"wires":[["a83bf3122efb505c"]]},{"id":"a83bf3122efb505c","type":"trigger","z":"f92bb64b93bfc46e","name":"","op1":"0","op2":"1","op1type":"num","op2type":"num","duration":"500","extend":false,"overrideDelay":false,"units":"ms","reset":"","bytopic":"all","topic":"topic","outputs":1,"_mcu":{"mcu":true},"x":380,"y":320,"wires":[["32dcd5abbb576d04"]]},{"id":"32dcd5abbb576d04","type":"rpi-gpio out","z":"f92bb64b93bfc46e","name":"","pin":"2","set":"","level":"0","freq":"","out":"out","bcm":true,"_mcu":{"mcu":true},"x":550,"y":320,"wires":[]},{"id":"f4e5d2f56c212e2b","type":"function","z":"f92bb64b93bfc46e","name":"function 2","func":"\nreturn msg;","outputs":1,"noerr":0,"initialize":"","finalize":"","libs":[],"_mcu":{"mcu":true},"x":380,"y":160,"wires":[["1b6498b943de30db"]]},{"id":"1b6498b943de30db","type":"function","z":"f92bb64b93bfc46e","name":"function 3","func":"\nreturn msg;","outputs":1,"noerr":0,"initialize":"","finalize":"","libs":[],"_mcu":{"mcu":true},"x":400,"y":220,"wires":[["90e9ccdae3d60741"]]},{"id":"75fb29423f8a0770","type":"mqtt-broker","name":"Owl2 for mcu","broker":"192.168.49.83","port":"1883","clientid":"","autoConnect":true,"usetls":false,"protocolVersion":"4","keepalive":"60","cleansession":true,"birthTopic":"","birthQos":"0","birthPayload":"","birthMsg":{},"closeTopic":"","closeQos":"0","closePayload":"","closeMsg":{},"willTopic":"","willQos":"0","willPayload":"","willMsg":{},"userProps":"","sessionExpiry":"","_mcu":{"mcu":false}}]

It looks like this particular project should work on the ESP8266. I can reproduce your crash, so something is wrong. But running the same flow on an ESP32 with the same memory partition doesn't crash.

Unfortunately, it isn't entirely obvious where / why it is failing. The ESP8266 is notoriously difficult for native debugging (xsbug works great, of course!). I'll leave this open to investigate further when I have more time.

To address the question in the title, your mileage may vary. Different nodes consume very different amounts of memory. There's a small amount of memory used by each node for basic bookkeeping. Beyond that, it depends on the node implementation which is a function of both implementation style and functionality. The nodes implemented specifically for the MCU try to be reasonably memory efficient. The node implementations taken from Node-RED itself are generally fairly memory intensive. The trigger node is one of those -- I'd like to rewrite that. At least the common paths could use far less memory.

OK, thanks. I don't need to use four function nodes, I was just experimenting and was surprised when it failed with that flow.
The flow in the plugin issue (where it turned out the problem was related to not preloading the flows) is more complex and includes two contrib nodes, and does run ok, though it doesn't have a trigger node. If I have time I will see if I can deduce anything further.

I had a little time to come back to this. The crash is caused by a native stack overflow. The stack is just 4 KB, so there's not much margin. Still, that's usually plenty. My guess is that the garbage collector is overflowing because of some unusually long object chain. I want to confirm that though.

Increasing the native stack to 5 KB allows it to run reliably. (This behavior is consistent with this flow working fine on ESP32 with a similar memory partition, since ESP32 has a larger native stack by default.)

This flow uses the Trigger node. The MCU implementation of the Trigger Node is borrowed from Node-RED with a few minor adjustments. Alas, it is a very heavy implementation (with Promises even!). That's surely using more than a little memory and may be triggering(!) this problem. But, that's speculation at this point.

Thanks for looking at this. Do the function nodes consume stack if there are no messages going through them? I didn't think there were, but I need to go back and check. I haven't got the brain power to look at this at the same time as the other issue at the moment, I will come back to it when the other is sorted out.
Fundamentally, though, the device will only be able to handle flows consisting of a handful of nodes, so one needs to be very careful about the flow design.

@colinl – nothing for you to do here at the moment, I was just sharing an update.

Do the function nodes consume stack if there are no messages going through them?

No, they do not. But, they mostly just consume JavaScript stack space when running, not native stack space, which is what is overflowing here.

Fundamentally, though, the device will only be able to handle flows consisting of a handful of nodes, so one needs to be very careful about the flow design.

For sure. Part of that care is using nodes that have relatively light footprints. The Trigger node should really be rewritten to reduce its footprint -- the complexity of the implementation exceeds the functionality.

This issue motivated me to enhance the runtime to detect native stack overflows. XS already has very well tested support for detecting stack overflow as a result of our (extensive) fuzz testing for vulnerabilities. That wasn't being used on ESP8266 or ESP32 though. With today's Moddable SDK update, it is. That doesn't fix anything but it allows the problem to be caught when it happens. Instead of a core dump from the device, xsbug stops and shows the JavaScript stack.

Using that information, I was able to make improvements to the runtime to reduce native stack use. That definitely helps. Those changes are generally good as they also reduce JavaScript stack depth and are likely faster in most cases. I also bumped up the native stack on ESP8266 by 512 bytes, to give a little bit more margin.

With all that in place, your particular case doesn't crash for me. You can always push things further to get to a failure, of course. Diagnosing that should be easier moving forward, at least.

To try it out you need to do three things:

  • Get the latest Node-RED MCU Edition
  • Get the latest Moddable SDK
  • Make sure to rebuild the ESP8266 core. The easiest way to do that is to delete $MODDABLE/build/tmp/esp

FWIW – stack overflow detection is imprecise on these devices. The ESP8266 Arduino code has its approach, FreeRTOS has another, XS now does what it can. But, without an MMU there's always the chance of missing an overflow. That's life. ESP32 has the same challenge, but it has so much more memory that it can have a bigger default stack so native stack overflows there are exceedingly rare.

Excellent, that has made a big difference. I can run a flow with an Inject node, DS18B20, simple Function, MQTT out, Trigger, GPIO Out node and Debug node with no problems. In fact to that I can add five more trivial Function nodes and it still runs. If I add a sixth Function node then I get a Stack Overflow message, but I don't get a stack dump.

Thanks for trying that out so quickly. Very glad to hear that the combination of improvements are working as hoped.

FWIW – there's plenty of opportunity to explore further optimization. Function nodes, for example, are heavier than they probably need to be in order to emulate Node-RED behaviors, like sandboxing. One step at a time though. I think we can close this particular issue out, yes?

Yes, thanks.