BotoX/xiaomi-m365-compatible-bms

Recover from OCD

pgruber75 opened this issue · 33 comments

Hi.
I had some strong settings flashed on my M365 which triggered BMS over current protection. This ended in Error 21.
debug_print() got me increasing OCD counter.
Resetting AVR didn't help, I had to cut my shrinking tube and unplug balance connector. After replugging everything was good again.
Is there a way to reset BQ769x0 without power cycling it?
Do I get it correct, that if a protection is triggered BMS never recovers on its own?

Thanks,
Patrick

BotoX commented

The issue is reproducible with the latest code in this repository?

Can you post the full output of debug_print here?

The OCD error should clear automatically after 10 seconds and enable discharging again.

I downloaded, compiled and uploaded last week. So yes, I think latest code.
I let the battery sit over night and next day debug_print() showed hundreds of OCD-errors.

This stopped only after reseating balance-connector.

One question: Is it correct, that BMS<->ESC communication is shut down (Error 21) in case of BMS-errors?

Here is full debug output but after complete power-cycle. Sorry.

>>> debug_print()
b'55aa0322fa0500dbfe'
>>> 0x00 SYS_STAT:  10000000
0x01 CELLBAL1:  00000000
0x04 SYS_CTRL1: 00011000
0x05 SYS_CTRL2: 01000011
0x06 PROTECT1:  00011101
0x07 PROTECT2:  01111011
0x08 PROTECT3   00010000
0x09 OV_TRIP:   10101111
0x0A _TRIP:   11001000
0x0B CC_CFG:    00011001
0x32 CC_HI_LO:  0
0x2A BAT_HI_LO: 32121
ADCGAIN:        378
ADCOFFSET:      43
CHG DIS:        0
DISCHG DIS:     0

uptime: 3995
Battery voltage: 49082 (32121)
Battery current: -8 (-1)
SOC: 99.99
Temperature: 19.90 20.60
Balancing status: 0
Cell voltages (12 / 15):
4087 (10701), 4086 (10696), 4091 (10711), 43 (1), 4091 (10711), 4089 (10705), 4091 (10711), 4091 (10710), 43 (0), 4086 (10696), 4089 (10705), 4092 (10714), 4094 (10718), 44 (3), 4087 (10699)
Cell V: Min: 4086 | Avg: 4090 | Max: 4094 | Delta: 8
maxVoltage: 4972
maxDischargeCurrent: 506
maxChargeCurrent: 289
XREADY errors: 0
ALERT errors: 0
UVP errors: 0
OVP errors: 0
SCD errors: 0
OCD errors: 0

DISCHG TEMP errors: 0
CHG TEMP errors: 0
CHG OCD errors: 0
BotoX commented

I downloaded, compiled and uploaded last week. So yes, I think latest code.
I let the battery sit over night and next day debug_print() showed hundreds of OCD-errors.

That sounds very wrong, it wasn't even plugged into anything?

This stopped only after reseating balance-connector.

Maybe it was actually shorted? o_O

One question: Is it correct, that BMS<->ESC communication is shut down (Error 21) in case of BMS-errors?

The error 21 comes from no communication with the BMS at all.
So either the BMS crashed or something else is prohibiting the communication.

The BMS will set error 22/23 however if the battery is too hot or the cells are out of balance (>100mV)

Here is full debug output but after complete power-cycle. Sorry.

Make sure to capture a debug_print() if/when the issue happens again.

No, Battery wasn't plugged anywhere.
Resetting AVR via RESET-Pin didn't change anything. Ok, OCD-error-counter was reset to 0 to then rise gain without anything plugged in.
I thought BMS was fried and took my other battery - identical.
Plugged it into scooter, drove with full throttle uphill. Error 21.
It also never recovered, same behaviour.
My test was to reseat balance-connector what solved the problem - and did the same with the first battery.

As I mentioned - unfortunately both did not recover from OCD.

For Error 21: Even after resetting AVR error stayed!

It is strange.
I tried to reproduce OCD and OCP-errors by setting the limits real low.
OCP worked, D-FET switched off and on again by itself. No Error 21.
OCD no success. I lowered to 500mA and it never switched off.

Looks like as I drove the scooter and current raised over limit not only AVR crashed, TI-chip also crashed and ressurrection was only possible by power-cycling it.

Is it possible that setting the parameters via g_Settings.x is not always successful? I had to write the settings and put and apply them several times until protection triggered. "Serial protocol is crap"-problem?

How can I verify if settings are set or not?

BotoX commented

OCD definitely works for me and others though.. 🤔
it's actually handled by the TI chip itself

yeah basically "Serial protocol is crap"
to fix it you can just do putSettings() a bunch of times and then check if it transferred correctly with getSettings()

First I want to thank you for your support and this great project!

Unfortunately getSettings() is getting me no settings:

D:\botox>py -i configtool.py
>>> getSettings()
b'55aa0322f1007079fe'
>>> getSettings()
b'55aa0322f1007079fe'
>>> getSettings()
b'55aa0322f1007079fe'
>>> getSettings()
b'55aa0322f1007079fe'
>>> getSettings()
b'55aa0322f1007079fe'
>>> getSettings()
b'55aa0322f1007079fe'
>>> getSettings()
b'55aa0322f1007079fe'
>>> getSettings()
b'55aa0322f1007079fe'
>>>

Should I see here the variables loaded from EEPROM?

-- EDIT

Gave it a try, downloaded and compiled completely new. Flashed.
Now:

`>>> getSettings()
b'55aa0322f1007079fe'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\xiaomi-m365-compatible-bms-master\configtool.py", line 211, in getSettings
    d = m365_recv()
  File "D:\xiaomi-m365-compatible-bms-master\configtool.py", line 199, in m365_recv
    d = g_Queue.get(True, 1)
  File "C:\Users\Patrick\AppData\Local\Programs\Python\Python39\lib\queue.py", line 179, in get
    raise Empty
_queue.Empty
>>> getSettings()
b'55aa0322f1007079fe'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\xiaomi-m365-compatible-bms-master\configtool.py", line 211, in getSettings
    d = m365_recv()
  File "D:\xiaomi-m365-compatible-bms-master\configtool.py", line 199, in m365_recv
    d = g_Queue.get(True, 1)
  File "C:\Users\Patrick\AppData\Local\Programs\Python\Python39\lib\queue.py", line 179, in get
    raise Empty
_queue.Empty
>>>`
BotoX commented

getSettings() writes the result into g_Settings variable

And if there is no response/error then the communication failed.

Ok. So is there a possibility to read out the actually set parameters?
If I try to read the code with my rudimentary knowledge I would say no.

BotoX commented

After doing getSettings just write g_Settings in the python console lol.
That'll print whatever it received from the BMS

Oh man. I have to admit I am a bit ashamed :-)

Will do some testing now.

So it happened again.
As I already told, there are two M365 I equipped with 12s3p and this BMS.
I testdrove mine yesterday with peak 32A and 1500W (I have this m365dash on my scooter so I have realtime readings here).
I gave it to her unusually really hard. Steep hills with full throttle and 90kg on it.
No issues, all fine like expected.

Then I got a message from the owner of the other scooter. Drove it. Shut off. Error 21. No chance to recover.
I will get hands on it on saturday. As I said, my battery, that worked perfect for me yesterday already also crashed in his scooter.

My thoughts: Could this be a problem with controller or motor? Some kind of Back-EMF or whatever that crashes BMS-AVR and -TI-chip?
I used the communcation-cable from the original BMS and crimped a plug for the com-port on BMS to it.
As it is a little to short to route it the old way it goes straight over the controller to the socket there. Straight over the FETs...

What do you mean?

BotoX commented

I have seen BMS which crashed when the scooter was driven hard.
Flipping over the BMS or routing the cables different has fixed the issue for some.

In principle the watchdog feature of the AVR should reset it.
Maybe it can't recover the TI chip?
In any way, NOT crashing by better EMF protection is a better solution :D
My BMS has the metal lid mounted on top of the electronic side, that metal lid is facing the controller.

Weird thing is:

Both my batteries are built exactly the same, as I thought after the first one it was a good design ;-)
BMS is on top of the cells, not over the ESC.

03F37597-247D-4775-B4CC-C4BB50373AF7-1024

Both batteries are dying only in one of the two scooters.

I will try swapping controllers and/or motors.

After doing some research about the TI chip I came to the fact, that it is NOT resettable - or even not easy and reliable.
Crappy design...

But if we reset AVR and have TI in faulty condition, is it by design, that AVR refuses to communicate with ESC?
Or why is this Error 21 popping up? Even after manual AVR-reset instead of watchdog (which reset several times before I assume)

It is always the same thing. Making things for me works nearly every time. Making it for others generates trouble. Perhaps I should stop doing it :-D

BotoX commented

I've heard of the pro motor causing lots of back emf and resetting a BMS which works fine on the normal motor.

Short update:

Routing motorcables direct and kind of twisted, re-twisting datacable to BMS and routing it beneath the ESC, as far away as possible from motorcable kind of solved my problems.

We made a 8km ride with fullthrottle, steep up- and downhills without any issue.

I think case is closed.

Thanks for your support!

So it happened again :-(

Next try is a shielded datacable. Shield connected to GND on the serial connector on the BMS.

Got my hands on the battery now.

debug_print() brings this. Why is battery current not zero?

'

0x00 SYS_STAT: 10100000
0x01 CELLBAL1: 00000000
0x04 SYS_CTRL1: 00011000
0x05 SYS_CTRL2: 01000000
0x06 PROTECT1: 00011101
0x07 PROTECT2: 01111001
0x08 PROTECT3 00010000
0x09 OV_TRIP: 10101101
0x0A _TRIP: 11001000
0x0B CC_CFG: 00011001
0x32 CC_HI_LO: 0
0x2A BAT_HI_LO: 30576
ADCGAIN: 378
ADCOFFSET: 43
CHG DIS: 0
DISCHG DIS: 0
'
'
uptime: 1522663
Battery voltage: 46745 (30575)
Battery current: 6061 (711)
SOC: 71.79
Temperature: 18.00 18.30
Balancing status: 0
Cell voltages (12 / 15):
3895 (10193), 3895 (10193), 3896 (10194), 44 (4), 3895 (10192), 3895 (10192), 3896 (10194), 3896 (10194), 46 (8), 3893 (10187), 3886 (10168), 3896 (10194), 3896 (10194), 44 (3), 3896 (10194)
Cell V: Min: 3886 | Avg: 3895 | Max: 3896 | Delta: 10
maxVoltage: 5029
maxDischargeCurrent: 3013
maxChargeCurrent: 1074
XREADY errors: 19
ALERT errors: 0
P errors: 0
OVP errors: 90
SCD errors: 0
OCD errors: 0
'
DISCHG TEMP errors: 0
CHG TEMP errors: 2
CHG OCD errors: 0`

After resetting AVR I get

watchdog_test()
b'55aa0322fa0a00d6fe'

BOOTED!
bq769x0 ERROR: XREADY

Any idea?

From here https://e2e.ti.com/support/power-management-group/power-management/f/power-management-forum/923907/bq76940-xready-trigger-reason-identify
I got this post

XREADY will trigger when 5 of the 8 communications between cell groups internal to the part are corrupt in a 1 second interval. The part considers the reported result unreliable in this condition and turns off FET outputs. Some common causes:

  1. Missing power on upper groups such as droop as you indicate causing reset of the group
  1. Broken or missing power connection
  1. Shorted CAPn capacitor
  1. TSn having trapped voltage at boot due to missing pull down
  1. Pull down of TSn below its reference (abs max violation)
  1. Large noise on power pins
  1. Some customers have reported XREADY in high field strength environments, it is not know if the entry is through the power pins or TSn to a remote thermistor or other. Filtering and shielding are the expected solutions.

Items 6 and attracted my attention. Is it the motor? Or some EMF over these thermistorwires, that run straight over the cells?

have same issue, did you solve the problem?

Can't tell 100%. Changed motor with other M365 for testing purposes.
Problem hasn't reappeared until now.

Could you post a picture of your battery to check if we have similarities in layout?

i have 12s4p batterry, and the bms is on top of esc
https://imgur.com/Ru9Bs2g

Ok. My bms is on top of the battery.

I suspect motors emf hitting the data line or according to the post before hitting the temp sensors.

Which error do you get?

21 on display of m365, and this is debug_print()
0x00 SYS_STAT: 00000010
0x01 CELLBAL1: 00000000
0x04 SYS_CTRL1: 00011000
0x05 SYS_CTRL2: 01000001
0x06 PROTECT1: 00011101
0x07 PROTECT2: 01111001
0x08 PROTECT3 00010000
0x09 OV_TRIP: 10110000
0x0A _TRIP: 11001001
0x0B CC_CFG: 00011001
0x32 CC_HI_LO: 0
0x2A BAT_HI_LO: 29360
ADCGAIN: 377
ADCOFFSET: 46
CHG DIS: 0
DISCHG DIS: 16

uptime: 49056
Battery voltage: 44826 (29360)
Battery current: 0 (0)
SOC: 60.18
Temperature: 21.90 22.10
Balancing status: 0
Cell voltages (12 / 15):
3737 (9793), 3737 (9792), 3736 (9788), 50 (11), 3732 (9779), 3734 (9784), 3734 (9784), 3734 (9785), 49 (10), 3733 (9780), 3733 (9782), 3734 (9785), 3734 (9784), 46 (1), 3734 (9783)
Cell V: Min: 3732 | Avg: 3735 | Max: 3737 | Delta: 5
maxVoltage: 4724
maxDischargeCurrent: 3868
maxChargeCurrent: 1547
XREADY errors: 0
ALERT errors: 0
P errors: 0
OVP errors: 0
SCD errors: 222
OCD errors: 0

DISCHG TEMP errors: 0
CHG TEMP errors: 0
CHG OCD errors: 0

this happens twice in the same place, after drop down from a border on speed about 10 kmph, it happened after about 6-7 km of ride, and today then i went to work it happened again, drop down from a little border and small speed with same distanse rided

You have max discharge current of over 38 amps. Is this on purpose?

BMS triggered shortcut error. Do you have tried to reset avr?

reseting avr did nothing, i have limited max current in m365 firmware to 18 amps

Then TI chip crashed. Only disconnecting balance plug will help. Same shit like I had.
My guess is emf from motor over data line or thermistor lines.

Is bms' aluminum shield facing controller?

changed wirening of cables, for now 2 days without any errors

same issue, changing wireng doesn't help nor covering in aluminium foil with disconecting it from esc, what kind of motor do you use? and is there any issues after changing motor? i'm using for now m365(not pro) motor and planing to change it to pro

changed motor to original pro(300w) no changes

Changed motor too. From another classic. Was good for 3 weeks. Got error 21 again.

I have similar problem. I have measured that there is no voltage provided for atmega chip - so reset will never work :D. Maybe it is possible to reset by disconnecting only one wire from balancer? Did any body try it already?

Hi Adam,

unfortunately I couldn't get hands on it since then.
I do not have a guess anymore as I have two batteries and both die in only this one scooter.
All components (motor, ESC, BLE and battery) were changed between both scooters already and it only happens in this one M365.

Perhaps the way it is ridden is different, I don't know.

One thing is left I would try. It is the routing of the thermistor wires. Maybe some interferences and spikes are introduced to the microcontroller on BMS. And next thing would be trying to use a shielded wire as data cable between ESC and BMS.

Other than that I have no more ideas... And I know, that issue is annoying as f..k

Patrick