Memory exhaustion due to erroneous read size
Jonty opened this issue · 6 comments
I've found yet another bug while working on receiving extremely large websocket messages, however I'm not entirely convinced it's size related.
The same test program as from the previous large message issue should reproduce the problem:
#include <ArduinoWebsockets.h>
#include <WiFi.h>
const char* ssid = "SSID";
const char* password = "PASSWORD";
const char* websockets_connection_string = "wss://ws.blockchain.info/inv";
using namespace websockets;
void onMessageCallback(WebsocketsMessage message) {
Serial.print("Got Message: ");
Serial.println(message.data());
}
WebsocketsClient client;
void onEventsCallback(WebsocketsEvent event, String data) {
if(event == WebsocketsEvent::ConnectionOpened) {
Serial.println("ConnectionOpened");
}
}
void setup() {
Serial.begin(115200);
WiFi.begin(ssid, password);
for(int i = 0; i < 10 && WiFi.status() != WL_CONNECTED; i++) {
Serial.print(".");
delay(1000);
}
client.onMessage(onMessageCallback);
client.onEvent(onEventsCallback);
client.setCACert(nullptr);
client.connect(websockets_connection_string);
// Results in total memory exhaustion, regardless of the amount of free memory available
// ESP Panics
client.send("{\"op\":\"ping_block\"}");
}
void loop() {
client.poll();
}
I obtained an ESP32 WROVER module with 8mb of ram, which I know all the messages should easily fit into. At first I thought the allocator was failing to use the external memory as it carried on crashing, but after some digging it's revealed an odd bug.
It seems that the following segment of code in websockets_endpoint.cpp
is the cause of the memory exhaustion:
ArduinoWebsockets/src/websockets_endpoint.cpp
Lines 177 to 181 in a3da335
Inserting a print
statement to show the size of numReceived
reveals normal behaviour right up until the crash, at which point the value of numReceived
is set to the maximum value for uint32_t
(4294967295) and the loop reads garbage data until the device runs out of memory.
Changing uint32_t
to uint64_t
for the return of client.read()
resulted in the max value for uint64_t
being returned instead.
This is a log & backtrace of the above program running, with websockets_endpoint.cpp
patched to print out the size of numReceived
:
rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:852
load:0x40078000,len:8424
load:0x40080400,len:5880
entry 0x4008069c
.[E][WiFiClient.cpp:309] setOption(): fail on fd -1, errno: 9, "Bad file number"
ConnectionOpened
size: 512
size: 512
size: 512
size: 512
size: 512
size: 512
size: 512
size: 498
size: 512
size: 512
size: 512
size: 512
size: 512
size: 512
size: 512
size: 498
size: 512
size: 512
size: 512
size: 512
size: 326
size: 512
size: 512
size: 512
size: 172
size: 512
size: 512
size: 373
size: 4294967295
Guru Meditation Error: Core 1 panic'ed (LoadStoreError). Exception was unhandled.
Core 1 register dump:
PC : 0x401421ca PS : 0x00060c30 A0 : 0x8014228c A1 : 0x3ffb1bf0
0x401421ca: websockets::internals::readData[abi:cxx11](websockets::network::TcpClient&, unsigned long long) at /Users/jonty/code/esp/xtensa-esp32-elf/xtensa-esp32-elf/include/c++/5.2.0/bits/shared_ptr_base.h:659
A2 : 0x00000575 A3 : 0x3ffbb9b8 A4 : 0x00000ff2 A5 : 0xffffffff
A6 : 0x3ffb1e44 A7 : 0x00000000 A8 : 0x0001f627 A9 : 0x40000000
A10 : 0x00000000 A11 : 0x000000fe A12 : 0x3ffb1c0c A13 : 0x00000000
A14 : 0x00000ff2 A15 : 0x3ffe636c SAR : 0x00000004 EXCCAUSE: 0x00000003
EXCVADDR: 0x40000000 LBEG : 0x40098705 LEND : 0x40098715 LCOUNT : 0xffffffff
0x40098705: strlen at /home/jeroen/esp8266/esp32/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/machine/xtensa/../../../../.././newlib/libc/machine/xtensa/strlen.S:84
0x40098715: strlen at /home/jeroen/esp8266/esp32/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/machine/xtensa/../../../../.././newlib/libc/machine/xtensa/strlen.S:96
Backtrace: 0x401421ca:0x3ffb1bf0 0x40142289:0x3ffb1e30 0x401433fd:0x3ffb1e80 0x4013f7e3:0x3ffb1ef0 0x4013bdfa:0x3ffb1f90 0x401207e1:0x3ffb1fb0 0x4008b799:0x3ffb1fd0
0x401421ca: websockets::internals::readData[abi:cxx11](websockets::network::TcpClient&, unsigned long long) at /Users/jonty/code/esp/xtensa-esp32-elf/xtensa-esp32-elf/include/c++/5.2.0/bits/shared_ptr_base.h:659
0x40142289: websockets::internals::WebsocketsEndpoint::_recv() at /Users/jonty/code/esp/xtensa-esp32-elf/xtensa-esp32-elf/include/c++/5.2.0/bits/shared_ptr_base.h:659
0x401433fd: websockets::internals::WebsocketsEndpoint::recv() at /Users/jonty/code/chain/components/arduino/libraries/ArduinoWebsockets/src/websockets_endpoint.cpp:312
0x4013f7e3: websockets::WebsocketsClient::poll() at /Users/jonty/code/chain/components/arduino/libraries/ArduinoWebsockets/src/websockets_client.cpp:499
0x4013bdfa: loop() at /Users/jonty/code/chain/main/main.cpp:93
0x401207e1: loopTask(void*) at /Users/jonty/code/chain/components/arduino/cores/esp32/main.cpp:19
0x4008b799: vPortTaskWrapper at /Users/jonty/code/esp/esp-idf/components/freertos/port.c:355 (discriminator 1)
to_read
is always a sane value, so I also tried adding an extra check to the read loop to ensure that i <= to_read
and while this prevented the memory exhaustion it resulted in being disconnected from the server shortly afterward, so clearly the session state is messed up.
This should be easily reproducible on an ESP32 with limited memory as the exhaustion trigger happens before any significant memory usage, although I've not tested this.
Let me know if you need any more information, or want me to test things.
Interesting.
The MAX value you are seeing is actually -1
but is shown as max value because the variable is signed. -1
indicates an error in receiving data from the socket. I will look into it now.
Also, I've added a c++
tag to your code snipped (edited your post) so it's easier to read, I hope you don't mind.
I will look at it and will keep you updated.
Gil.
I thought that might be the case but didn't want to presume! Let me know if I can help with the debugging.
Sure,
Few experiments you could make that will help pin down the problem:
- Run the exact same thing with the highest debug settings. Meaning set esp32 logs to
Verbose
. This should tell us a lot about what's going in inside the esp32 libraries. - Run the code with the latest Arduino-ESP32 libraries. I've found out recently that the Espressif team working on that is doing a lot of changes without pushing them as stable releases. Can you try cloning the master branch and run the same code?
- Last thing, which will help if this is actually a memory issue. Is to add this line before
connect
:
client.setFragmentsPolicy(FragmentsPolicy_Notify);
Websocket servers often use fragmented messages to send large amounts of data. I aggregate those fragments inside the library's memory and then pass it as a complete message. Adding that line will cause each fragment to be passed to the callback, thus (hopefully) saving plenty of memory. If the esp32 is running out of space, this might help.
I had to reset my PC this weekend, so I'm still setting everything up.
Thanks,
Gil.
I've worked on your issue, I believe it is fixed now. Please look at the branch bugfix/unchecked_reads_issue_26
It should really improve error handling and memory management.
Please let me know if it's working for you (with your new board) before I merge it.
Gil.
This works beautifully! Apologies I didn't have time to help you debug it.
Thank you so much for all your help. If you have a wishlist/patreon/similar, please let me know.