sinara-hw/Booster

Transient channel fault on Creotech v1.3 Booster (devid 003200223436510B32323838)

Closed this issue · 9 comments

One of the channels on a Creotech Booster v1.3 (devid 003200223436510B32323838) developed a channel fault, which went away with power-cycling the whole device.

The Booster in question had been on for a week or two, and was mostly sitting there, with RF being applied to one channel, and the SCPI interface only occasionally being used to check the interlock status. Channel 3 had spontaneously changed from enabled (though without RF applied) to the ERROR LED being on.

Diagnostics from before power-cycling it:

> status 3
[status] e=0 s=0 r1=17 r2=11 tx=0.000 rf=0.000 curr=0.000 t=0.00 i=0.00 ip=0.00
> logstash
[INFO] SYSCLK frequency: 168000000
[INFO] PCLK1 frequency: 42000000
[INFO] PCLK2 frequency: 84000000
[INFO] ADC: OK | raw: 1996 | VrefInt 1.22 V
[INFO] PGOOD: OK
[INFO] network client disconnected
[INFO] network client disconnected
[INFO] network client disconnected
> start

PGOOD: 1
FAN SPEED: 30 %
AVG TEMP: 28.25 CURRENT: 28.25
CHANNELS INFO
==============================================================================
                #0      #1      #2      #3      #4      #5      #6      #7
DETECTED        1       1       1       0       1       1       1       1
HWID            7B:13   21:67   38:F1   00:00   E5:15   90:48   58:27   A7:46
INPWR [V]       0.55    0.81    0.43    0.00    0.01    0.01    0.26    0.01
TXPWR [V]       1.45    0.06    0.01    0.01    0.02    0.01    0.01    0.04
RFLPWR [V]      0.64    0.13    0.05    0.01    0.04    0.08    0.01    0.01
INPWR [dBm]     -nan    -nan    -nan    0.00    -nan    -nan    -nan    -nan
TXPWR [dBm]     22.83   5.00    5.00    0.00    5.00    5.00    5.00    5.00
RFLPWR [dBm]    7.90    -3.29   -4.30   0.00    -4.57   -4.61   -3.35   -4.36
I30V [A]        0.073   0.046   0.047   0.000   0.046   0.048   0.047   0.047
I6V0 [A]        0.240   0.228   0.230   0.000   0.240   0.238   0.239   0.211
5V0MP [V]       5.011   5.003   5.002   0.000   5.017   5.006   5.006   5.034
ON              1       1       1       0       1       1       1       1
SON             1       1       1       0       1       1       1       1
IINT            0       0       0       0       0       0       0       0
OINT            0       0       0       0       0       0       0       0
SINT            0       0       0       0       0       0       0       0
ADC1            2370    105     17      18      35      18      18      57
ADC2            1043    211     76      11      71      123     11      12
INTSET [dBm]    35.99   28.99   29.00   0.00    28.99   28.99   28.99   28.99
DAC1            4095    4095    4095    0       4095    4095    4095    4095
DAC2            3670    3047    3006    0       3024    3055    2997    3068
SCALE1          87      82      84      0       85      90      86      87
OFFSET1         384     552     499     0       496     354     455     450
BIASCAL         1029    1255    1535    0       1383    1473    1467    1491
HWIS            86.75   90.25   85.50   0.00    85.25   86.25   86.92   84.58
HWIO            547.75  430.25  526.50  0.00    552.25  554.25  476.92  615.58
LTEMP           28.50   28.50   27.00   0.00    28.00   28.00   27.25   28.00
RTEMP           27.00   26.00   27.00   0.00    28.00   28.25   27.50   28.00
==============================================================================
> i2cdetect 3
[i2c_scan] start
[i2c_scan] end
> i2cerr
                #0      #1      #2      #3      #4      #5      #6      #7
I2C ERR         0       0       109     131     0       0       0       0

(The channel 2 errors are from running i2cdetect on channel 2 as well due to a typo.)

> version
RFPA v1.4.1 218aca0, built Sep 16 2019 16:23:32, hv rev 3

> devid
[devid] 003200223436510B32323838

Afterwards, the channel came back online just fine. I didn't check whether it properly amplifies RF just yet, but it seems like this was most likely just due to an I2C bus lockup of some sort. Status output after rebooting:

PGOOD: 1
FAN SPEED: 30 %
AVG TEMP: 27.50 CURRENT: 27.50
CHANNELS INFO
==============================================================================
                #0      #1      #2      #3      #4      #5      #6      #7
DETECTED        1       1       1       1       1       1       1       1
HWID            7B:13   21:67   38:F1   FE:C5   E5:15   90:48   58:27   A7:46
> INPWR [V]     0.52    0.39    0.02    0.35    0.43    0.01    0.26    0.71
TXPWR [V]       1.45    0.07    0.01    0.01    0.02    0.01    0.01    0.04
RFLPWR [V]      0.64    0.13    0.05    0.01    0.04    0.07    0.01    0.01
INPWR [dBm]     -nan    -nan    -nan    -nan    -nan    -nan    -nan    -nan
TXPWR [dBm]     22.84   5.00    5.00    5.00    5.00    5.00    5.00    5.00
RFLPWR [dBm]    7.90    -3.33   -4.30   -3.52   -4.56   -4.65   -3.35   -4.36
I30V [A]        0.073   0.046   0.047   0.047   0.046   0.048   0.047   0.047
I6V0 [A]        0.240   0.228   0.230   0.246   0.241   0.238   0.239   0.211
5V0MP [V]       5.011   5.003   5.002   5.012   5.018   5.008   5.006   5.035
ON              1       1       1       1       1       1       1       1
SON             1       1       1       1       1       1       1       1
IINT            0       0       0       0       0       0       0       0
OINT            0       0       0       0       0       0       0       0
SINT            0       0       0       0       0       0       0       0
ADC1            2371    110     18      19      36      18      18      57
ADC2            1042    208     75      12      73      121     11      12
INTSET [dBm]    35.99   28.99   29.00   28.99   28.99   28.99   28.99   28.99
DAC1            4095    4095    4095    4095    4095    4095    4095    4095
DAC2            3670    3047    3006    3136    3024    3055    2997    3068
SCALE1          87      82      84      88      85      90      86      87
OFFSET1         384     552     499     482     496     354     455     450
BIASCAL         1029    1255    1535    1589    1383    1473    1467    1491
HWIS            86.75   90.25   85.50   86.75   85.25   86.25   86.92   84.58
HWIO            547.75  430.25  526.50  620.75  552.25  554.25  476.92  615.58
LTEMP           28.00   27.25   27.00   27.50   28.25   28.50   27.00   28.50
RTEMP           27.00   27.50   27.50   26.00   26.00   27.00   27.25   26.00
==============================================================================

Does the above version still have any known I2C bugs? If so, could you please post a new release with binaries? If not, what would be the next steps to debug this?

Closing as your firmware is relatively old and I'm pretty sure that there have been I2C fixes since then (I haven't reproduced an I2C error in a while, only watchdog timeouts).

@wizath please can you post an RC firmware release on the firmware repo?

@wizath: Could I second that request, please? I'm trying to "just use" Booster, and don't even know which of the issues to dig to find the most recent build…

Still an issue I (and any other users) aren't able to fix.

Did you update the firmware?

How could I have done that? Still no binaries at https://github.com/sinara-hw/booster-firmware/releases (nor in /AMS/Software/Booster/Firmware binaries), and I haven't a clue which random link in what issue would have the most recent build.

(There is now also a second problem on another Booster, which might or might not just be fixed in the new firmware.)

How could I have done that? Still no binaries at https://github.com/sinara-hw/booster-firmware/releases (nor in /AMS/Software/Booster/Firmware binaries), and I haven't a clue which random link in what issue would have the most recent build.

A search of the issue tracker for the most recent occurrence gives you this: #319 (comment) I believe that is the version to use that has all the I2C fixes....obviously, I agree that tracking these kinds of issues would be a lot easier if we had an RC firmware binary on the firmware github project....ahem @gkasprow @wizath

(I'd leave this open until the fix has been tested, i.e. the problem hasn't occurred for a few days after the firmware update, so we have a discoverable place to track any further developments. If you'd rather close it here, then let's do that, but we'd at least need some sort of group-internal tracker to fill the same role.)

Thanks, I'll try that build.

Yep, said problem hasn't come back. (But now we have the watchdog faults discussed elsewhere.)