commaai/panda

H7: CAN Bus Disconnected during button press send to car

sunnyhaibin opened this issue ยท 16 comments

For some HKG CAN-FD, when the resume button press is sent to the car, it would cause "CAN Bus Disconnected" and requires the car to restart to reset the error.

We have attempted to revert this commit, and users are no longer getting this error. We are unsure of the relevance, but so far no user has experienced this issue in many Stop and Go traffic with openpilot sending the resume button to the car.

This issue is also seen on sunnypilot 0.9.2, which previous versions did not happen (although not certain when this started happening, it was not an issue back on December 16th, 2022).

Affected Platforms

  • 2023 Hyundai Palisade HDA2
    • Non-HDA2 unknown
  • 2023 Kia Telluride HDA2
    • Non-HDA2 unknown
  • 2022 Ioniq 5
    • Non-HDA2 unknown

Affected Routes

  • commaai/openpilot#27392
    • 28aa956828c3407d|2023-06-06--09-31-12
    • abada7a7a4dccf94|2023-06-06--11-15-08
    • 9446757551345818|2023-06-05--16-56-41
    • 28aa956828c3407d|2023-06-05--15-43-30

Affected Route on sunnypilot

  • aa5f7d1d9ac7db11|2023-05-30--09-41-25
  • e42931599f753d96|2023-06-08--17-23-33
  • aa5f7d1d9ac7db11|2023-05-29--12-18-52
  • e42931599f753d96|2023-05-26--21-00-17

Bus 1 is going into the bus off state. @briskspirit can you look into this?

@sunnyhaibin how did you come up with that commit? did you narrow it down to exactly that one or only tested before/after that one? The commit only changes behavior on init, and doesn't even run on the red panda. It would be helpful if you guys can git bisect to the exact commit since it's easy to repro.

@sunnyhaibin how did you come up with that commit? did you narrow it down to exactly that one or only tested before/after that one? The commit only changes behavior on init, and doesn't even run on the red panda. It would be helpful if you guys can git bisect to the exact commit since it's easy to repro.

@adeebshihadeh This came up with git bisect. It's a bit confusing as the other ones that came up didn't seem to be related with Red Panda, this was the closest one.

I couldn't reproduce this on our EV6 on openpilot master, even sending 50 button msgs every frame. @sunnyhaibin do you have a reliably way to reproduce this? ideally a clean branch based on master We've reproduced it; fix coming soon!

We were attempting to enable openpilot longitudinal control on the Ioniq 6 2023 HDA2 by disabling the ADAS Driving ECU 0x730 via bus 1, the same method used to enable openpilot longitudinal for currently supported HDA2 cars. It seems that as soon as the ECU is disabled (confirmed in cabana and plotjuggler), bus 1 went into the bus off state and there is no traffic on bus 1. This did not affect bus 0 or 2, however.

Is it the only ECU on the bus?

Is it the only ECU on the bus?

It is not, AFAIK. ECUs that broadcast MDPS, SCC, ESP, etc. are also on bus 1.

We have another route that seems to have triggered this issue:

Hey @sunnyhaibin had a canbus error while driving today. I was kind of spamming the off and on instead of letting the car slam on breaks due to the long stock radar. Could have been my actions but wanted to share just in case. I rebooted in the middle of a drive and no issues. Got a few hours of time on most recent update and this is the first fault

@sunnyhaibin messaged you with the branch to try

Sent @briskspirit in DM with the routes that have 100% success rates of button sends not putting bus 1 into bus off state, posting them here as well for visibility. #1615 was implemented:

696748e0ac8082fb|2023-09-02--19-41-51

696748e0ac8082fb|2023-09-02--20-52-30

696748e0ac8082fb|2023-09-02--22-26-43

696748e0ac8082fb|2023-09-03--12-21-33

696748e0ac8082fb|2023-08-31--14-25-28

28aa956828c3407d|2023-09-03--17-16-47

caf6a54b6d467dbd|2023-09-02--11-59-54

A route from sunnypilot 0.9.4.1 with 100% success rates of button sends: fc19648042eb6896|2023-09-05--14-22-53

Screenshot from 2023-09-05 13-34-32

Data looks promising so far! In this route bus off state happened on first button spam session, CAN core was reset and continued functioning as normal.

merged to master, let me know if something goes wrong

@sunnyhaibin @briskspirit

fc19648042eb6896|2023-09-07--16-40-59--3

Used Sunny's test-c3-vw-custom-stock-long

Something is wrong.
Could you please take a look at this?
I'm not sure because I was in the passenger seat and not driving, but I think the driver pressed the resume button without setting the cruise. As a result, the canvas was disconnected and several warning lights on the vehicle's dashboard cycled on and off. I immediately restarted the vehicle and the problem went away.
It may not be related to this issue, I'm not sure.
image

For your information, I'm a Sorento HEV and it's not merged, so I'm using it to recognize it as a Sorento PHEV

@VoltIcaRus don't see it to be panda related. by logs panda worked fine, but controls were not allowed. This ticket is the wrong place to post.

Yes If that's not a problem, great
pressed the resume button and the can bus was released and recorded
Glad to hear it's not related, thank you for your research!