ERROR LED
Closed this issue · 25 comments
We got a red ERROR LED on one of our Booster v1.4s. @dnadlinger will post details soon.
@wizath what diagnostics should we be using to determine the source of this error?
First of all you can try i2cdetect <ch number>
If you've got something like this
> i2cdetect 5
[i2c_scan] start
[i2c_scan] end
Most likely I2C bus on channel is shorted (with highest probability of dead temperature sensor)
The temperature suddenly went to 63C when the channel died. So an I2C issue was my top suspicion. Power cycling the device cleared it.
Protection is set to 60 degrees. Please always check logstash
command before power cycling. There should be an error
Ah, sorry, I didn't think about logstash
. Here is the status output before power-cycling:
> status 5
[status] e=0 s=0 r1=12 r2=350 tx=0.000 rf=0.000 curr=0.000 t=63.00 i=1.03 ip=-nan
PGOOD: 1
FAN SPEED: 100 %
AVG TEMP: 63.00 CURRENT: 63.00
CHANNELS INFO
==============================================================================
#0 #1 #2 #3 #4 #5 #6 #7
DETECTED 1 1 1 1 1 1 1 1
HWID 02:22 CD:F9 7B:9F 20:8D E8:27 4F:79 21:12 C6:3A
INPWR [V] 0.00 0.13 0.00 0.07 0.63 1.03 0.00 0.61
TXPWR [V] 0.01 0.01 0.15 0.12 1.69 0.01 0.01 0.02
RFLPWR [V] 0.04 0.01 0.01 0.01 0.60 0.22 0.12 0.07
INPWR [dBm] -nan -nan -nan -nan -nan -nan -nan -nan
TXPWR [dBm] 5.00 5.00 5.00 5.00 26.49 0.00 5.00 5.00
RFLPWR [dBm] -4.40 -4.53 -4.25 -3.55 9.62 0.00 -0.01 -4.17
I30V [A] 0.044 0.046 0.049 0.046 0.095 0.000 0.048 0.001
I6V0 [A] 0.243 0.252 0.243 0.247 0.247 0.000 0.251 0.254
5V0MP [V] 4.924 4.922 4.930 4.938 4.910 0.000 4.930 4.950
ON 1 1 1 1 1 0 1 1
SON 1 1 1 1 1 0 1 0
IINT 0 0 0 0 0 0 0 0
OINT 0 0 0 0 0 0 0 1
SINT 0 0 0 0 0 0 0 0
ADC1 14 13 250 191 2765 12 13 34
ADC2 59 15 15 13 990 356 203 111
INTSET [dBm] 35.00 38.00 37.99 37.99 37.99 37.99 37.00 36.00
DAC1 4095 4095 4095 4095 4095 4095 4095 4095
DAC2 3245 3268 3415 3341 3322 3252 3385 3683
SCALE1 83 85 82 83 87 88 85 87
OFFSET1 470 446 727 619 460 375 565 571
BIASCAL 1865 1539 1879 1935 1527 1939 1761 1929
HWIS 82.08 84.33 83.17 82.83 85.17 83.92 83.00 85.25
HWIO 865.08 823.33 1003.17 939.83 852.17 818.92 978.00 957.25
LTEMP 30.25 32.00 32.00 32.00 32.50 63.00 32.50 32.00
RTEMP 30.00 32.25 31.00 31.00 32.50 63.00 30.00 30.00
==============================================================================
@wizath do you think the reading of 63.00 degrees could be an I2C error, or do you think something went wrong that pushed the temperature that high?
@wizath I added a troubleshooting page to the wiki. Can you check that the advice I gave there is correct please?
AFAIK it's hard to heat up module even to 50 degrees. Maybe without cooling and with all channels at maximum power.
Yes, but the (@dnadlinger correct me if I'm wrong) the fans were spinning, and all other channels were at 30C, so it seems hard to believe that one channel could actually be drawing that much current -- particularly if the 30V foldback limiting was working.
So, I assume this has to be some kind of issue with the temperature measurement, doesn't it?
Status from previous comment stated that fan speed was at 100% with 63 degrees
Right, so assuming nothing went wrong with the fan controller, this seems like an issue with the temperature measurements, I think.
@dnadlinger if we have more problems like this, let's log the booster statuses to influx db. That will show up things like sudden (non-physical) changes in temperature.
@wizath, 63.00 C seems like an odd temperature measurement for a fault condition, doesn't it. IIRC last time I had I2C issues it read 150.25. Any ideas about potential causes?
@wizath if you can think at all of anything that could cause an erroneous temperature reading of 63C please let me know
Also, can you remind me what the difference between LTEMP
and RTEMP
is?
That's temperature sensor internal temperature and remote diode temperature
I see. You mean the detector has both its own internal diode, but can also use a diode-connected external transistor? Out of curiosity, what's the point of having both? Are they in different places (e.g. the diode nearer the FET) or something?
Is there anything special about 63.00?
@wizath: Re "all channels at maximum power", the output above was the steady state, i.e. low power on channel 4, all others idle. Fan exhaust was cool to the skin too.
That's why I think it was I2C error. Tomorrow I'll review i2cerror
command to provide more information about bus errors
@wizath could this have been a bit flip error on the temperature bus or something like that?
I can't think of anything other than bus error that caused this problem. Since air was cool, channel was running at low power.
Can you next time use i2cerr
command? It'll list bus error count on each of the channels
> i2cerr
#0 #1 #2 #3 #4 #5 #6 #7
I2C ERR 0 0 0 3 0 0 0 0
Nice, thanks!
Out of curiosity, why doesn't #282 produce anything in the logstash?
Updated the Troubleshooting page on the wiki, but can you add this to the VCP page as well please?
Out of curiosity, why doesn't #282 produce anything in the logstash?
Only one command that lights up red led is not producing any error message - and that's checking if channel already has an error. But generation of that error status should log error message to stash.
I've added log message to this sequence, can you test this now?
Okay, will reflash and try to reproduce.
@dnadlinger I'm assuming this is a duplicate of #282