cnlohr/colorchord

WDT Resets in station mode

Opened this issue · 16 comments

After frying the power on my Wemos D1, which was running my modified build just fine, and also losing my build environment after a harddrive upgrade, I just rebuilt the whole "base" build environment and flashed a new Wemos. Now it's working fine in AP mode, but when I switch to Station mode, I'm getting a situation where it connects to the AP for a few seconds, then performs a WDT reset (serial output below). Any thoughts on what's going on here?

Opmode: 1
Station mode: "GL-MT300N" (bssid_set:0)
Loading Settings: af / 0 / 69 / 69
Settings Loaded: ESP_31E5FC / Default
RST REASON: 0
sleep enable,type: 2
mode : sta(84:f3:eb:31:e5:fc)
add if0
scandone
state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 1
cnt 

connected with GL-MT300N, channel 1
dhcp client start...
ip:192.168.8.124,mask:255.255.255.0,gw:192.168.8.1
IGMP Joining: 7c08a8c0 fb0000e0
STAT: 5
IP: 192.168.8.124
NM: 255.255.255.0
GW: 192.168.8.1
WCFG: /GL-MT300N/
IGMP Joining: 7c08a8c0 fb0000e0
Fatal exception 0(IllegalInstructionCause):
epc1=0x4023b20c, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000
⸮
 ets Jan  8 2013,rst cause:4, boot mode:(3,6)

wdt reset
load 0x40100000, len 29384, room 16 
tail 8
chksum 0x74
load 0x3ffe8000, len 1304, room 0 
tail 8
chksum 0xcb
load 0x3ffe8520, len 3524, room 0 
tail 4
chksum 0x84
csum 0x84
<gobledygoop>
Opmode: 1
Station mode: "GL-MT300N" (bssid_set:0)
...rinse..repeat...

@AEFeinstein were you running into an issue like this when you were porting the ESP82xx stuff?

No, but I never attempted using station mode.

@acoulson2000 did you compile this yourself? If so can you examine the program.lst and see where 0x4023b20c falls?

I am also having the same issue. The exception occurs almost at the same address, 0x4023b208.
I just built .lst file and here is what I have:

4023b205: 000000 ill

4023b208 <read_sar_dout>:
4023b208: aa6791 l32r a9, 40225ba4 <system_restart_hook+0x10>
4023b20b: a2ee81 l32r a8, 40223dc4 <strdup+0x44>
4023b20e: 0b0c movi.n a11, 0

How do I build a.lst file? I can try it I'm there morning. Will also look at whether there are recent commits? My working version was forced back around October, I think.

@acoulson2000
make debug
I get a file called image.lst

Hmm.. This seems like the issue where EnterCritical and ExitCritical are incorrectly being called. Can you verify that if the device is connecting, it calls the EnterCritical function? Just put a printf( "EnterCritical\n" ); and printf( "ExitCritical\n" ); in there.

Sorry I'm not in a position where I can test this myself.

@cnlohr Looks like ExitCritical is being called, but should there be a EnterCritical prior to it?

Opmode: 1
Station mode: "race2" (bssid_set:0)
EnterCritical.
Loading Settings: af / 0 / 69 / 69
Settings Loaded: ESP_867E81 / Default
ExitCritical.
RST REASON: 0
sleep enable,type: 2
mode : sta(84:0d:8e:86:7e:81)
add if0
ExitCritical.
scandone
state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 1
cnt

connected with race2, channel 1
dhcp client start...
ip:192.168.0.15,mask:255.255.255.0,gw:192.168.0.1
IGMP Joining: 0f00a8c0 fb0000e0
STAT: 5
IP: 192.168.0.15
NM: 255.255.255.0
GW: 192.168.0.1
WCFG: /race2/
ExitCritical.
IGMP Joining: 0f00a8c0 fb0000e0
Fatal exception 0(IllegalInstructionCause):
epc1=0x4023b214, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000

ets Jan 8 2013,rst cause:4, boot mode:(3,6)

wdt reset
load 0x40100000, len 29292, room 16
tail 12
chksum 0xd7
ho 0 tail 12 room 4
load 0x3ffe8000, len 1304, room 12
tail 12
chksum 0x1a
ho 0 tail 12 room 4
load 0x3ffe8520, len 3556, room 12
tail 8
chksum 0x56
csum 0x56

How long is it between the ExitCritical to the crash? I think this looks right so it may be a different problem.

Also, there is an EnterCritical first. It's called much earlier on.

It's about 8 seconds until the crash.

I see that EnterCritical is called first, but then ExitCritical is called twice. I don't see anything wrong with that though.

Interesting... It should be in EnterCritical when negotiating for WPA2, so this is probably the wrong behavior but why it's crashing I'm still not sure. This is very strange.

Charles, I'm curious - what were you referring to in the "stab at the esp8266 port" commit?

I'm referring to the commits surrounding this one: 2a0c78d

I have my build compiled with three_samples and I am experiencing exactly the same behavior.
I am also using Wemos D1 mini version of ESP8266.
The first reset was manual to ensure I have a clean start. Following is the log:

;l<break>
[21:53:47:292] d<0x9c><0xdf>|<break>
[21:53:47:292] <0x8c>l<0xe0>|<0x03><0x04><0x0c><0x04><0x8c><0x04>l<0xec><0x04>c|<0x8f><0xc3><0x03><0xe4><0x13><0x9b>r<0x93>#<0x8c><0x04>c<0x84><0xfb>'o<0xdf>dg'<0x9c><0xe3><0xe4><0x04>c<0x1c>p<0x8c><0xc7>ds$sdp<0xfb>'<0xe0><0x10><0x03><0x04><0x0c><0x83><0x0c>l<0x04><0x0c><0x04><0x0c><0x04><0x04>c<0x04>g<0xe3>|<0x03><0x8c>$l<0x8e><0x0c><0x04>c<0x8c><0xfb>g'<0xe7><break>
[21:53:47:320] l<0xc4><0x87>d`<0x02><0x90><0x1b><0x13>ogd<0x8c>d`<0x02><0x07><0x03>gs<0x8e><0xdb><0x93>o<0x0c><0x04><0xc3>c$`<0x03>`<0xf2>'<0x0c><0x0c><0x04><0x9f><0xe0>c<0xc3>'$<0x8c><0x04><0x8c><0xf3>og<0xe7><break>
[21:53:47:360] <0x0c><0x8f><0x07>dp<0xfb>g<0xe0><0x10><0x02><0x04><0x0c>s<0xc4><0x9c><0x9c><0xe3><0xe0><0x04><0x04><0x0c><0x04>c<0x04>o<0xe3><<0x03>l<0xe4><0x0c><0x04><0x8f>c<0x84><0xf2>'o<0xef><break>
[21:53:47:364] $<0x8c><0x04>l <0x03><0x98><0x13><0x1b>gol<0x84>l <0x03><0x07><0x03>n;<0x87><0x93><0xdb>g<0x04><0x0c>c<0xdb>d`<0x02> <0xfb>g<0x04><0x0c><0x0c><0x9f><0xe0>#<0x83>od<0xc4><0x0c><0x84><0xf2>'o<0xef><break>
[21:53:47:374] <0x04><0x87><0x07>lx<0xf3>o<0xe0><0x18><0x03><0x0c><0x04>;<0x8c><0x9c><0x9c><0xe3><0xe0>l<0x8e><0x1c><0x80><0x0c>b<0x0c>'<0xe3>|<0x03><0xe4>l<0x8f><0x87><0x8e>c<0x8c><0xfb>g'<0xe7><break>
[21:53:47:383] l<0xc4><0x0c>d`<0x02><0xd8><0x1b><0x1b>'ol<0xc4>l <0x03><0x07><0x03>o;<0xc7><0x9b><0xdb>'<0x04><0x0c><0x9b><0x8c><0x93>`<0x02><0x07>{<0x92><0x9b>o<0x04><0x04><0x93><0xc4><0x92> <0x03>{<0x13>o<;<0x1b><0xc3>s<0x03>g$g<0xe0><0x80><0x03><0x84><0x04>c|<0x80><0x1b>'<0x92>|#c<0x92><0xfb><0x93>'<0xe0><0x80><0x02><0x04><0x87><0x0f>l<0x04><0xf3>ng<0x9e><0x8c>go<0x9f><0xe4><0xdb><0x83><0xdb>`<0x03><0x7f><0x82><0x1b><0x1b><0x0c><0xc4>gn<0x9f><0xe4><0xdb><0x83><0x9b>`<0x03><0x7f><0xc3><0x13>c<0x0c><0x84>o'<0x9f><0xec><0x9b><0x82><0x93>`<0x02><0xc7><0x1b>r<0x83><0xdb><0x93>o<0x1b><0x1b>b<0x83><0x1b>g<0x9f><0xec>?<0xe3>o<0x1b><0x9f><0xe0><0x04><0x83>g<0xe3><0xfe>#<0x93><0xfb><0x12>n|<0x98><0x03><0x04><0xc7><0xe4><0x92>s;<0x93>{r<0x1b>ld`<0x03><0xfc><0x84><0x0c><0x04><0x0c>s<0xc4><0x16>,⇥	<0xc2><0xcd>fc000<0x8c><0xe3><0x03><0xe4><0x1b><0x83>o<0xec><0x9b>;<0x83><0xfb>g|<0xec><0x0c>d<0x04>ldl <0x03><0x1c>c<0x9b><0x1b><0x03><0x04><0x9f>|<0x03>;<0x93><0x03>l<0x9c>o<0xe0><0x0c><0x83>g<0xe3><break>
[21:53:47:436] <0x0c>d`<0x03><0xc4><0xe3>;<0x9b>$<0x8c>d<0x13><0x84><0x04><0x0c><0x04><0xfe>C<0xa1><0xa8><0x8b><0xeb>K␍<0xa1><0xbd><0xc9><0x91>5␊
[21:53:47:440] Opmode: 2␍␊
[21:53:47:440] Default SoftAP mode: "ESP_00887D":""␍␊
[21:53:47:444] Loading Settings: af / 0 / 67 / 67␍␊
[21:53:47:447] Settings Loaded: ColorC2 / 2nd ColorChord␍␊
[21:53:47:451] RST REASON: 6␊
[21:53:47:451] sleep enable,type: 2␍␊
[21:53:47:454] mode : softAP(2e:3a:e8:00:88:7d)␍␊
[21:53:47:458] add if1␍␊
[21:53:47:458] dhcp server start:(ip:192.168.4.1,mask:255.255.255.0,gw:192.168.4.1)␍␊
[21:53:47:462] bcn 100␍␊
[21:53:47:465] IGMP Joining: 0104a8c0 fb0000e0␍␊
[21:53:47:835] add 1␍␊
[21:53:47:835] aid 1␍␊
[21:53:47:835] station: 58:00:e3:e6:9f:f5 join, AID = 1␍␊
[21:54:04:569] Switching to: "wlan_home2"/"Klaucovi2015" (10/12). BSSID_SET: 0 [1]␍␊
[21:54:04:574] station: 58:00:e3:e6:9f:f5 leave, AID = 1␍␊
[21:54:04:579] rm 1␍␊
[21:54:04:579] bcn 0␍␊
[21:54:04:579] del if1␍␊
[21:54:04:585] usl␍␊
[21:54:04:585] mode : sta(2c:3a:e8:00:88:7d)␍␊
[21:54:04:585] add if0␍␊
[21:54:04:731] Switching.␍␊
[21:54:07:523] scandone␍␊
[21:54:08:463] state: 0 -> 2 (b0)␍␊
[21:54:08:463] state: 2 -> 3 (0)␍␊
[21:54:08:471] state: 3 -> 5 (10)␍␊
[21:54:08:471] add 0␍␊
[21:54:08:471] aid 4␍␊
[21:54:08:471] cnt ␍␊
[21:54:08:480] ␍␊
[21:54:08:480] connected with wlan_home2, channel 6␍␊
[21:54:08:536] dhcp client start...␍␊
[21:54:09:287] ip:192.168.11.65,mask:255.255.255.0,gw:192.168.11.254␍␊
[21:54:09:290] IGMP Joining: 410ba8c0 fb0000e0␍␊
[21:54:09:362] STAT: 5␍␊
[21:54:09:362] IP: 192.168.11.65␍␊
[21:54:09:362] NM: 255.255.255.0␍␊
[21:54:09:369] GW: 192.168.11.254␍␊
[21:54:09:369] WCFG: /wlan_home2/␍␊
[21:54:09:369] IGMP Joining: 410ba8c0 fb0000e0␍␊
[21:54:18:468] Fatal exception 0(IllegalInstructionCause):␍␊
[21:54:18:471] epc1=0x4023b238, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000␍<0xff>␍␊
[21:54:18:480]  ets Jan  8 2013,rst cause:2, boot mode:(3,6)␍␊
[21:54:18:484] ␍␊
[21:54:18:502] load 0x40100000, len 29580, room 16 ␍␊
[21:54:18:523] tail 12␍␊
[21:54:18:523] chksum 0xd2␍␊
[21:54:18:523] ho 0 tail 12 room 4␍␊
[21:54:18:530] load 0x3ffe8000, len 1304, room 12 ␍␊
[21:54:18:530] tail 12␍␊
[21:54:18:530] chksum 0xb5␍␊
[21:54:18:538] ho 0 tail 12 room 4␍␊
[21:54:18:538] load 0x3ffe8520, len 3544, room 12 ␍␊
[21:54:18:546] tail 12␍␊
[21:54:18:546] chksum 0x5b␍␊
[21:54:18:546] csum 0x5b␍␊
[21:54:18:553] s<0x1b>'

The read_sar_dout call in embedded8266/user/adc.c is definitely troublesome in station mode and tripping the wdt reset. It seems to be part of the (closed source?) libphy.a library. My guess would be that it is a problem upstream of esp-open-sdk in the ESP8266_NONOS_SDK-2.1.0 release.

I came across source code for the function here:

https://github.com/pvvx/esp8266web/blob/master/info/libs/phy/phy_get_vdd33.c

However, the SAR_BASE mentioned in the comment didn't seem to be the correct value. The value defined here worked for me:

https://github.com/PetteriAimonen/esp-walkie-talkie/blob/master/fast_adc.c

I was able to get it stable in station mode by adding this function and calling it in place of read_sar_dout:

void hs_read_sar_dout(uint16 * buf)
{
   volatile uint32 * sar_regs = &((volatile uint32_t*)0x60000D00)[32];
   int i, x, z;
   for(i = 0; i < 8; i++) {
      x = ~(*sar_regs++);
      z = (x & 0xFF) - 21;
      x &= 0x700;
      if(z > 0) x = ((z * 279) >> 8) + x;
      buf[i] = x;
   }
}

I have been working with the latest master by @cnlohr (making sure the 2 submodules esp82xx and eps_nonos_sdk are at the same commits as he uses) and gradually bringing in my changes. I use station mode all the time. I do not get resets when in station or soft AP mode, but there is still some strange behaviour that I am trying to sort out. It is present in master (as well as my branch with the additions). Trying the new hs_read_sar_dout does not seem to make a difference. The problems I observe are

    1. If my router is running, often when restarting it will NOT reconnect to the station (and it is waiting to do, stuck at stats = STATION_CONNECTING). It will only connect AFTER I restart my router.
    1. If the router is not running it will try to connect to the station, give up, then connect as a soft AP.
    1. While running if I set GPIO0 to 0 (either in the gui, or by grounding the pin), it it goes into a loop trying to connect to soft AP. If I restart it will then create soft AP. (I have connected RST and GPIO16 on nodemcu so deepsleep can run)
    1. DCHP does not work. ESP_D0F9CA.local in soft AP or Station mode connects sometimes but constantly resets. I can only connect directly to the assigned IP.

@astateofblank I notice on https://github.com/pvvx/esp8266web pvvx mentions implementing an UDP Wave server (Integrated SAR ADC): Sending 14-bit samples at 1 Hz .. 48 kHz (max 192 kHz 12 bits). This is here https://github.com/pvvx/esp8266web/blob/master/app/driver/adc.c in addition to the code you referred to. Has anyone tried this?