User exception (panic/abort/assert) after random number of successful requests
NiallRees opened this issue ยท 17 comments
Basic Infos
- This issue complies with the issue POLICY doc.
- I have read the documentation at readthedocs and the issue is not addressed there.
- I have tested that the issue is present in current master branch (aka latest git).
- I have searched the issue tracker for a similar issue.
- If there is a stack dump, I have decoded it.
- I have filled out all fields below.
Platform
- Hardware: [Feather HUZZAH ESP8266]
- Core Version: [2.6.1]
- Development Env: [Arduino IDE]
- Operating System: [MacOS]
Settings in IDE
- Module: [Feather HUZZAH ESP8266]
- Flash Mode: [qio|dio|other]
- Flash Size: [4MB]
- lwip Variant: [v2 Lower Memory]
- Reset Method: [ck|nodemcu]
- Flash Frequency: [40Mhz]
- CPU Frequency: [160MHz]
- Upload Using: [SERIAL]
- Upload Speed: [115200] (serial upload only)
Problem Description
I'm using the WiFiClientSecure client and sometimes during connection, other times after a request, User exception (panic/abort/assert) is raised and the board restarts. If it successfully makes a request, it crashes on the next attempt.
I've attached a decoded stack trace along with heap memory debug. Would really appreciate some help with this.
MCVE Sketch (WiFi creds left blank)
#include <ESP8266WiFi.h>
#include <WiFiClientSecure.h>
const char* ssid = "";
const char* password = "";
const char* host = "script.google.com";
const int httpsPort = 443;
String SCRIPT_ID = "AKfycbyt1zJXaOvHo2_cz7Mfp6ivhan6XcGtnQO_UQzGzq6ECh3G4Zgj";
void setup() {
Serial.begin(115200);
delay(100);
Serial.println();
Serial.println();
Serial.print("Connecting to ");
Serial.println(ssid);
WiFi.begin(ssid, password);
while (WiFi.status() != WL_CONNECTED) {
delay(500);
Serial.print(".");
}
Serial.println("");
Serial.println("WiFi connected");
Serial.println("IP address: ");
Serial.println(WiFi.localIP());
}
void loop() {
delay(5000);
Serial.print("connecting to ");
Serial.println(host);
// Use WiFiClient class to create TCP connections
WiFiClientSecure client;
client.setInsecure();
if (!client.connect(host, httpsPort)) {
Serial.println("Connection failed");
return;
} else {
Serial.println("Connection success!");
}
String url = String("/macros/s/" + SCRIPT_ID + "/exec?Datetime=1X=2&Y=3&Z=4");
Serial.print("requesting URL: ");
Serial.println(url);
//get the sensor data in script format using get method
client.print(String("GET ") + url + " HTTP/1.1\r\nHost: script.google.com\r\nUser-Agent: BuildFailureDetectorESP8266\r\nConnection: close\r\n\r\n");
Serial.println("request sent");
}
Debug Messages
WiFi connected
IP address:
192.168.1.190
connecting to script.google.com
User exception (panic/abort/assert)
Abort called
>>>stack>>>
ctx: bearssl
sp: 3fff178c end: 3fff180c offset: 0000
3fff178c: 3ffe992c 00000047 00000004 deadbeef
3fff179c: 000000fe 00000000 00000000 00000000
3fff17ac: 00000000 00000000 00000000 3fff17dc
3fff17bc: 00000004 00000090 00000004 3ffe992c
3fff17cc: 00003a98 00000000 3ffffeb0 40205c8e
3fff17dc: 3ffe883e 00000000 4021d844 40205ca4
3fff17ec: 3fff66c1 3fff2004 deadbeef 4020513d
3fff17fc: deadbeef 00000000 3ffffeb0 40203cbc
ctx: cont
sp: 3ffffdd0 end: 3fffffc0 offset: 0000
3ffffdd0: 00003a98 00000000 00000008 402036d2
3ffffde0: 00000073 00000005 9d2f9dce 00000000
3ffffdf0: 00000001 00002834 3fff1814 4021d21e
3ffffe00: 3fff1d14 3ffe87b0 3fff1814 00000001
3ffffe10: 00000001 00000001 3ffffeb0 40203910
3ffffe20: 3ffe87b0 00000000 3ffffeb0 40203af3
3ffffe30: 40105249 3ffee960 3ffee960 0000000e
3ffffe40: 00000000 00000000 00000000 40205d7f
3ffffe50: 00000000 3ffef984 3ffffeb0 402026e0
3ffffe60: 000001bb 00000d50 3fffff2c 3ffee904
3ffffe70: 000001bb 3ffffeb0 3ffe87b0 3ffee904
3ffffe80: 000001bb 3ffffeb0 3ffe87b0 40203c11
3ffffe90: 40207928 aed33ad8 40207928 aed33ad8
3ffffea0: 3fffdad0 3ffe84c0 3ffee87c 4020118d
3ffffeb0: 402077c8 00000000 00003a98 40204eb9
3ffffec0: 00000000 3ffef984 3fff1814 3ffefb3c
3ffffed0: 00000000 00000000 3fff1814 00000000
3ffffee0: 00000000 3fff6934 3fff6f04 00000000
3ffffef0: 00000000 3fff256c 3ffefd14 3fff66bc
3fffff00: 3fff691c 00000000 00000000 00000000
3fffff10: 00004145 00000255 72000000 00000000
3fffff20: 00000000 00000000 00000000 00000000
3fffff30: 00000000 00000000 3ffe0001 3ffee904
3fffff40: 40204be4 3ffee87c 3ffe87a0 40004bf0
3fffff50: 00000000 3ffee87c 00000000 00000000
3fffff60: 3ffe8600 00000000 00000000 40204f34
3fffff70: 00000000 00ffffff 0101a8c0 40204fa8
3fffff80: 3ffe84c8 3ffee844 3ffee87c 402010d4
3fffff90: 40207928 be01a8c0 feefeffe feefeffe
3fffffa0: 3fffdad0 00000000 3ffee8c4 402058dc
3fffffb0: feefeffe feefeffe 3ffe84e8 40100ea5
<<<stack<<<
ets Jan 8 2013,rst cause:2, boot mode:(3,6)
load 0x4010f000, len 1384, room 16
tail 8
chksum 0x2d
csum 0x2d
v482516e3
Decoded
0x40205c8e: operator+(StringSumHelper const&, String const&) at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/cores/esp8266/WString.cpp line 408
0x4021d844: br_ssl_engine_recvrec_ack at src/ssl/ssl_engine.c line 290
0x40205ca4: operator+(StringSumHelper const&, String const&) at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/cores/esp8266/WString.cpp line 410
0x4020513d: fs::FS::open(char const*, char const*) at /Users/niallwoodward/Library/Arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/2.5.0-4-b40a506/xtensa-lx106-elf/include/c++/4.8.2/bits/shared_ptr_base.h line 781
0x40203cbc: BearSSL::WiFiClientSecure::available() at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/libraries/ESP8266WiFi/src/WiFiClientSecureBearSSL.cpp line 386
0x402036d2: std::__shared_ptr ::operator=(std::__shared_ptr &&) at /Users/niallwoodward/Library/Arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/2.5.0-4-b40a506/xtensa-lx106-elf/include/c++/4.8.2/bits/shared_ptr_base.h line 858
0x4021d21e: cc_do_sign at src/ssl/ssl_ccert_single_rsa.c line 131
0x40203910: BearSSL::WiFiClientSecure::~WiFiClientSecure() at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/libraries/ESP8266WiFi/src/WiFiClientSecureBearSSL.cpp line 119
0x40203af3: BearSSL::WiFiClientSecure::_run_until(unsigned int, bool) at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/libraries/ESP8266WiFi/src/WiFiClientSecureBearSSL.cpp line 533
0x40205d7f: String::equals(String const&) const at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/cores/esp8266/WString.cpp line 501
0x402026e0: ClientContext::_write_some() at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/libraries/ESP8266WiFi/src/include/ClientContext.h line 504
0x40203c11: BearSSL::WiFiClientSecure::_write(unsigned char const*, unsigned int, bool) at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/libraries/ESP8266WiFi/src/WiFiClientSecureBearSSL.cpp line 288
0x40207928: WiFiClient::peek() at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/libraries/ESP8266WiFi/src/WiFiClient.cpp line 284
0x40207928: WiFiClient::peek() at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/libraries/ESP8266WiFi/src/WiFiClient.cpp line 284
0x4020118d: connectToWifi() at /Users/niallwoodward/silverton-dev/smart-crunch/smart-crutch-huzzah/smart-crutch-huzzah.ino line 45
0x402077c8: _GLOBAL__sub_D__Z15settimeofday_cbOSt8functionIFvvEE() at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/cores/esp8266/sntp-lwip2.cpp line 494
0x40204eb9: TwoWire::write(unsigned char) at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/libraries/Wire/Wire.cpp line 191
0x40204be4: BearSSL::CertStore::freeHashedTA(void*, br_x509_trust_anchor const*) at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/libraries/ESP8266WiFi/src/CertStoreBearSSL.cpp line 219
0x40204f34: _GLOBAL__sub_I__ZN7TwoWire8rxBufferE() at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/libraries/Wire/Wire.cpp line 333
0x40204fa8: fs::File::peek() at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/cores/esp8266/FS.cpp line 73
0x402010d4: connectToWifi() at /Users/niallwoodward/silverton-dev/smart-crunch/smart-crutch-huzzah/smart-crutch-huzzah.ino line 38
0x40207928: WiFiClient::peek() at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/libraries/ESP8266WiFi/src/WiFiClient.cpp line 284
0x402058dc: String::init() at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/cores/esp8266/WString.cpp line 131
0x40100ea5: umm_realloc(void*, size_t) at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/cores/esp8266/umm_malloc/umm_malloc.cpp line 804
I've also just found that using host as script.google.com in https://github.com/esp8266/Arduino/blob/919c7535631d60e085f87d2b4c72d05a88651eed/libraries/ESP8266WiFi/examples/BearSSL_Validation/BearSSL_Validation.ino causes the exact same error.
@NiallRees can you try something simple on your end? I'm traveling for a few days and may not be able to run anything on real HW for a while.
Can you make:
Arduino/cores/esp8266/StackThunk.cpp
Line 39 in 919c753
into
#define _stackSize (6748/4)
and re-try the connections with your app or the Validation example? That would add an additional 1K to the available stack for BearSSL and narrow it down to a stack overflow issue vs. something else.
@earlephilhower thank you so much! It's working absolutely fine now. I've seen you've increased the stack size in previous releases, is the problem that certificates are getting larger and larger?
Could you add the following lines to your sketch to figure out exactly how much bigger the stack usage was?
First, add
#include <StackThunk.h>
at the top of your sketch, then after connect:
Serial.printf("BSSL stack: %d\n", stack_thunk_get_max_usage());
It's not really a certificate size thing (they're processed byte-by-byte as they come in), but it could be a new encryption mode which is using more stack than others we've seen.
Thx!
BSSL stack: 5636
Thanks, @NiallRees . Unfortunately, that's actually saying that the required stack is smaller than the default value (5636 vs. 5748) which gives more questions than answers.
Is that # stable through multiple connections? Do you see some fluctuations between runs/connections?
My thinking is that there might be some ISR called which requires more than (5748-5636 = 112 bytes) of space, but it's not always called so you have the case where the unmodified code works most times, but when that ISR happens the SYS needs >112 bytes of stack...
The alternative is that it's not the stack, but memory corruption somewhere (possible, but we do run valgrind on the host to check for stuff like that). By increasing the stack you don't avoid the corruption, but the corruption ends up in some part of the stack that's not used anyway so it's invisible to the app.
What part is using so much stack?
Is it in the client object?
What if you declare the client object on the heap?
@TD-er it's the BSSL stack, different from the SYS or CONT. There's a stack thunk mechanism involved to dynamically allocate and destroy it.
Ah, OK, didn't know that.
@NiallRees thanks for the detailed info. I was able to reproduce it using the BSSL_Validation.ino like you said. Sometimes it worked fine w/o any changes, but other times it would crash.
The issue is sometimes the max BSSL stack usage is 5636, which is fine, but other times I get that it's used 5828 which is past the stack end.
I'll bump up the stack allocation and tweak the validation exmable to dump the BSSL stack as well.
Hi @earlephilhower , I'm a total newbie and I just started learning Arduino using nodemcu esp8266. Do you have details on how to fix this?
@NiallRees can you try something simple on your end? I'm traveling for a few days and may not be able to run anything on real HW for a while.
Can you make:
Arduino/cores/esp8266/StackThunk.cpp
Line 39 in 919c753
into
#define _stackSize (6748/4)
and re-try the connections with your app or the Validation example? That would add an additional 1K to the available stack for BearSSL and narrow it down to a stack overflow issue vs. something else.
FYI
I have (had) the same problem. I am connected to my local MQTT-broker, but close this connection for a short connection to Pushover (to send messages to my phone). This led to crash after a number of connects (in the range 2 to 20-something).
So far:
01 BSSL stack: 0 -> 4616
02 BSSL stack: 4616 ->5736
03 BSSL stack: 5736 -> 6088
And no crashes yet.
//#define _stackSize (5900/4)
#define _stackSize (6900/4)
@Teddyz, you're saying you saw stack_thunk_get_max_usage
return 6088? Like I said, the # in the code is empirically determined, so it's possible to need more than the 5900.
I think 6100
would be a better number. Could you give it a try in your app and report back? If that's good, we can update the main code to give it another 200bytes and hopefully get this all over with.
I had this BSSL-check in my program running today:
#include <StackThunk.h>
uint16_t max_stack_usage = 0;
...
// In loop
if (now() - prev1s >= 1UL) {
//checkDelay("", 1);
prev1s = now();
relay (4); // update relay
yield();
//checkDelay("do every 1 seconds", 2);
uint16_t stack_usage = stack_thunk_get_max_usage();
static uint16_t usage_increases = 0;
if ( stack_usage > max_stack_usage ) {
Serial.printf("%02d BSSL stack: %d -> %d \n", ++usage_increases, max_stack_usage, stack_usage);
max_stack_usage = stack_usage;
}
return;
}
01 BSSL stack: 0 -> 4484
02 BSSL stack: 4484 -> 5796
03 BSSL stack: 5796 -> 5928
04 BSSL stack: 5928 -> 6056
05 BSSL stack: 6056 -> 6136
Soft WDT Crash after > 100 connections to api.pushover.net/1/messages.json. Plus the same number of connections to my local MQTT-broker.
01 BSSL stack: 0 -> 4516
02 BSSL stack: 4516 -> 5736
03 BSSL stack: 5736 -> 5796
04 BSSL stack: 5796 -> 5828
05 BSSL stack: 5828 -> 6104
06 BSSL stack: 6104 -> 6120 a
So far no crash.
The Soft WDT reset had the explanation below. I do not understand where it went wrong, but I also do not think it is related to this problem.
0x40100984: millis at C:\Users\Dick\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.6.3\cores\esp8266/core_esp8266_wiring.cpp line 188
0x40100969: millis at C:\Users\Dick\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.6.3\cores\esp8266/core_esp8266_wiring.cpp line 185
0x4020a4ea: lwmqtt_arduino_timer_get(void*) at D:\Mina Dokument\Arduino\libraries\MQTT\src/MQTTClient.cpp line 25
0x4010088c: ets_post at C:\Users\Dick\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.6.3\cores\esp8266/core_esp8266_main.cpp line 160
0x4020abe8: now() at D:\Mina Dokument\Arduino\libraries\Time/Time.cpp line 252 (discriminator 1)
0x40205afc: loop at D:\Mina Dokument\Arduino\MQTT\mqtt_pocsag_v001/mqtt_pocsag_v001.ino line 252
0x4010088c: ets_post at C:\Users\Dick\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.6.3\cores\esp8266/core_esp8266_main.cpp line 160
0x4020e1d8: loop_wrapper() at C:\Users\Dick\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.6.3\cores\esp8266/core_esp8266_main.cpp line 180
0x40101675: cont_wrapper at C:\Users\Dick\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.6.3\cores\esp8266/cont.S line 81
i had problem with resets on boot after "Refreshing JWT" but since the mod everything works fine for 2 days now.
BSSL stack: 6132
Changing BSSL stack to 6648 was not enough for me, BSSL was timing out.
I tried to use netClient.setBufferSizes(1024, 1024) as explained here https://arduino-esp8266.readthedocs.io/en/latest/esp8266wifi/bearssl-client-secure-class.html#setbuffersizes-int-recv-int-xmit
and it works now without timeout.
Changing BSSL stack to 6648 was not enough for me, BSSL was timing out. I tried to use netClient.setBufferSizes(1024, 1024) as explained here https://arduino-esp8266.readthedocs.io/en/latest/esp8266wifi/bearssl-client-secure-class.html#setbuffersizes-int-recv-int-xmit and it works now without timeout.
I have similar issue on this thread causes exception (panic/abort/assert)
but with different cause. Thanks to @stasysdr I also set the setBufferSizes to 1024, the default is 512 at WiFiClientBearSSL.cpp
.
The error was due to the conflict between ESPAsyncWebServer
, which receives web requests from the client and ESP8266 as the web server. During the client.connect
if it receives a request then the exception will occur, maybe the connection needed more buffer than the default due to the "hanging process that got stuck" based on my testing.
I did not change the default stack size defined as #define _stackSize (6200/4)
on StackThunk.cpp
. Since the I was receiving BSSL stack size around 4300 to 4400.
No more crashes have occurred after I adjusted the buffer size. Before, sometimes it would go through, and sometimes it wouldn't when client.connect
was hanging which is not good for ESP to get stuck on a process.