RaspberryPiFoundation/python-build-hat

build hat disconnects random from raspberry pi

cgi-gerlando-caldara opened this issue · 14 comments

Hello python-build-hat projects,

we facing a issue with the build hat stopping the interaction with raspberry pi
We a long running our Lego Maschine and random about half hour the code is crashing.

We see a disconnect and reconnect of the build hat on crash, strange because we given any disconnect command.

Main Goal
adjust motor speed and get all motor information about higher than 8-10 hours, if we need to change our code thats alright, if this is a bug we want to support.

Normal Operation, Set new Motor Speed, Get Motor Infos, Speed, Rotation A, Rotation
INFO:root:> port 0 ; coast
INFO:root:> port 0 ; combi 0 1 0 2 0 3 0 ; select 0 ; pid 0 0 0 s1 1 0 0.003 0.01 0 100; set 10

INFO:root:> port 0 ; selonce 0
INFO:root:< P0C0: +10 +251211 -131

Disconnect
INFO:root:< P0: timeout during data phase: disconnecting
INFO:root:< P0: disconnected

Code crashes in Between, because the read out of motor values is not possible on not connected build hat

Reconnect
INFO:root:< set baud rate to 115200
INFO:root:< P0: connected to active ID 4C
INFO:root:< type 4C
INFO:root:< nmodes =5
INFO:root:< nview =3
INFO:root:< baud =115200
INFO:root:< hwver =00000004
INFO:root:< swver =10000000
INFO:root:< M0 POWER SI = PCT
INFO:root:< format count=1 type=0 chars=4 dp=0
INFO:root:< RAW: 00000000 00000064 PCT: 00000000 00000064 SI: 00000000 00000064
INFO:root:< M1 SPEED SI = PCT
INFO:root:< format count=1 type=0 chars=4 dp=0
INFO:root:< RAW: 00000000 00000064 PCT: 00000000 00000064 SI: 00000000 00000064
INFO:root:< M2 POS SI = DEG
INFO:root:< format count=1 type=2 chars=11 dp=0
INFO:root:< RAW: 00000000 00000168 PCT: 00000000 00000064 SI: 00000000 00000168
INFO:root:< M3 APOS SI = DEG
INFO:root:< format count=1 type=1 chars=3 dp=0
INFO:root:< RAW: 00000000 000000B3 PCT: 00000000 000000C8 SI: 00000000 000000B3
INFO:root:< M4 CALIB SI = CAL
INFO:root:< format count=2 type=1 chars=5 dp=0
INFO:root:< RAW: 00000000 00000E10 PCT: 00000000 00000064 SI: 00000000 00000E10
INFO:root:< M5 STATS SI = MIN
INFO:root:< format count=14 type=1 chars=5 dp=0
INFO:root:< RAW: 00000000 0000FFFF PCT: 00000000 00000064 SI: 00000000 0000FFFF
INFO:root:< C0: M1+M2+M3
INFO:root:< speed PID: 00000FA0 00000064 00002328 000003CA
INFO:root:< position PID: 00002710 000003E8 0000C350 00000000
INFO:root:< P0: established serial communication with active ID 4C

Do you mind posting your python code, so I can try to replicate this.

Thanks

Do you mind posting your python code, so I can try to replicate this.

Thanks

Hello Chrisruk,

I allowed access to our private repo you find the main and build hat class here:

https://github.com/cgi-gerlando-caldara/sensor_code/blob/main/main/main.py
https://github.com/cgi-gerlando-caldara/sensor_code/blob/main/main/BuildHat.py

I hope this helps to verify the correct usage of the build hat lib.

Best regards
Gerlando Caldara

If you can provide a public repo / gist of a minimal test case, I can have a look

Thanks

If you can provide a public repo / gist of a minimal test case, I can have a look

Thanks

Hello Chrisruk,

you can find the reduced version here public on github.
https://github.com/cgi-gerlando-caldara/sensor_code_lite

I added a log file this the build hat disconnected from the raspberry, without command, with this sample code

Hope this helps with reproducing the issue.

Best regards
Gerlando Caldara

Thanks a lot, I've just been having a look. I'll try to create a minimal test case from this to get to the root of the problem.

This looks like a blip due to data overload inside the firmware talking to the motor, similar to Issue #122 but takes longer to trigger since the firmware is not getting as far behind, initially.

@mutesplash, yep, I think you're right. I'll remove MQTT calls and test this out.

Hello everyone, thank you first of all for your time and ideas, we had already tested adding wait elements (sleep 1s) to the main loop. But even with this, it comes to the failure 30m-2h a long-term run as planned greater than 8h we do not get with it.
It would be great if we could achieve more stability. Maybe the Build Hat is unsuitable for our project and we have to find another solution.

To be clear, referencing that issue was not suggesting adding wait elements as a permanent solution, but that there is a problem internal to the firmware that connects to the motor where it "gets behind" on commands and overflows or resets some state.

Thanks a lot for this report @cgi-gerlando-caldara. I've just created a minimal test case and can replicate this issue.

This is hopefully going to be looked at soon in the firmware

We've just made a new release with new firmware with mitigation for this issue. I've ran my test case - https://github.com/RaspberryPiFoundation/python-build-hat/blob/main/test/motors.py#L141

For around 3hours in total with no disconnect.

I'll close this now, feel free to create a new issue if there are other issues

Works great, max timeframe tested 48 h without problems.