tonton81/FlexCAN_T4

Feature request, expose ESR and ECR register to diagnose software or hardware issues

designer2k2 opened this issue ยท 14 comments

Hello,

to help with diagnosing issues (Software or Hardware related) the ECR and ESR register gives some insight. The ECR register holds the RX / TX Error Counters. And the ESR register holds a load of informations on the status of the CAN module.
See Section "44.9.7 Error Counter Register (FLEXCANx_ECR)" and "44.9.8Error and Status 1 Register (FLEXCANx_ESR1)" in the documentation: https://www.pjrc.com/teensy/IMXRT1060RM_rev2.pdf
I would like to read this informations. And i believe this would also help others to see whats going on.

There is another thread in the PJRC forum mentioning this too: https://forum.pjrc.com/threads/24720-Teensy-3-1-and-CAN-Bus?p=103806&viewfull=1#post103806

There is some Code shown, what i hacked into my local FlexCAN_T4 lib, but im not shure if this is the appropiate way so i did not start a pull request:

Add to FlexCAN_T4.tpp:

FCTP_FUNC uint32_t FCTP_OPT::ErrorReport() {
	return (FLEXCANb_ESR1(_bus));
}

FCTP_FUNC uint8_t FCTP_OPT::ErrorCounter(bool RXorTX) {
  if (RXorTX == false) {
    return (FLEXCANb_ECR(_bus) >> 8); //shift 8 for RX
  }
  else {
    return (FLEXCANb_ECR(_bus));
  }
}

Add to FlexCAN_T4.h: (Line 459, but not shure if correct here)

	uint32_t ErrorReport();
	uint8_t ErrorCounter(bool RXorTX);

With this an output like this is possible:

CAN RX Error Count: 0
CAN TX Error Count: 57
Error Status: 100000000110110

Looking forward on your feedback!

Anyway, thank you for this lib, i use it happily with a teensy 4 in my car and it works perfectly.

greetings

I would refrain from using those registers directly as at least one of them if i recall changes state when read, and may affect the other as well. The ISR and the loop may cause rare inconsistancies so I would probably store them in a variable set to be read later, or have an error interrupt fire if necessary (error callback perhaps)

You are correct, i re-read the manual and reading the ESR register clears bits 15-10, clearing all the errors. Thats not helpful.
grafik

Reading the ECR for the RX/TX error counters is ok. They alone would help with finding troubles. So maybe this can considered?

Overall a error interrupt sounds good, or some other way where i can check if errors happened.

Would storing it be relevant? Or just for print debugging?
I'm testing something like this in Serial Monitor (from ISR, not loop):

FlexCAN Error and State Register (ESR1): 
Status: Transmitting
FLT_CONF: Error Active - Indicates the Confinement State of the FLEXCAN module
Error Counter Registers (ECR):
 --> RX_ERR_COUNTER: 0
 --> TX_ERR_COUNTER: 0

and this is what is looks like when I short CANH with CANL:

FlexCAN Error and State Register (ESR1): 
Status: Transmitting
ACK_ERR: Indicates that an Acknowledge Error has been detected by the transmitter node
TX_WRN: Indicates when repetitive errors are occurring during message transmission
FLT_CONF: Bus off - Indicates the Confinement State of the FLEXCAN module
Error Counter Registers (ECR):
 --> RX_ERR_COUNTER: 0
 --> TX_ERR_COUNTER: 128

the problem is this is live and especially if your network is very busy and the ISR is firing alot it will flood serial monitor :)
the other issue is if you don't read the event fast enough it could be changed again the next ISR hit, sooooo how you want to handle this :)

I have an idea, I will check certain bits so error is only reported if it occurs, by comparing the last ESR state vs new state

I have this at the end of the ISR currently being tested, try it out on yours and see if it's good and how we should proceed in terms of activation and callbacks?

  uint32_t esr1 = FLEXCANb_ESR1(_bus);
  static uint32_t last_esr1 = 0;
  if ( (last_esr1 & 0x4FF30) != (esr1 & 0x4FF30) ) {
    Serial.println("\n######################################################");
    Serial.print("STATUS: ");
    if ( (esr1 & 0x400C8) == 0x40080 ) {
      Serial.println("Idle");
    }
    else if ( (esr1 & 0x400C8) == 0x0 ) {
      Serial.println("Not synchronized to CAN bus");
    }
    else if ( (esr1 & 0x400C8) == 0x40040 ) {
      Serial.println("Transmitting");
    }
    else if ( (esr1 & 0x400C8) == 0x40008 ) {
      Serial.println("Receiving");
    }


    if ( esr1 & (1UL << 15) ) {
      Serial.println("BIT1_ERR: At least one bit sent as recessive is received as dominant");
    }
    if ( esr1 & (1UL << 14) ) {
      Serial.println("BIT0_ERR: At least one bit sent as dominant is received as recessive");
    }
    if ( esr1 & (1UL << 13) ) {
      Serial.println("ACK_ERR: Indicates that an Acknowledge Error has been detected by the transmitter node");
    }
    if ( esr1 & (1UL << 12) ) {
      Serial.println("CRC_ERR: Indicates that a CRC Error has been detected by the receiver node");
    }
    if ( esr1 & (1UL << 11) ) {
      Serial.println("FRM_ERR: Indicates that a Form Error has been detected by the receiver node");
    }
    if ( esr1 & (1UL << 10) ) {
      Serial.println("STF_ERR: Indicates that a Stuffing Error has been detected");
    }
    if ( esr1 & (1UL << 9) ) {
      Serial.println("TX_WRN: Indicates when repetitive errors are occurring during message transmission");
    }
    if ( esr1 & (1UL << 8) ) {
      Serial.println("RX_WRN: Indicates when repetitive errors are occurring during message reception");
    }

    Serial.print("FLT_CONF: ");
    if ( (esr1 & 0x30) == 0x0 ) Serial.print("Error Active - ");
    else if ( (esr1 & 0x30) == 0x1 ) Serial.print("Error Passive - ");
    else Serial.print("Bus off - ");
    Serial.println("Indicates the Confinement State of the FLEXCAN module");

    Serial.println("Error Counter Registers (ECR):");
    Serial.print(" --> RX_ERR_COUNTER: ");
    Serial.println((uint8_t)(FLEXCANb_ECR(_bus) >> 8));
    Serial.print(" --> TX_ERR_COUNTER: ");
    Serial.println((uint8_t)FLEXCANb_ECR(_bus));
    last_esr1 = esr1;
    Serial.println("######################################################\n");
  }
  FLEXCANb_ESR1(_bus) |= esr1;

No good programmer here, so please take that as a wish on how it could be.

I am polling and currently only check the overrun flag.
Would like to have something like the overrun flag what i could check if an error exists.

So im currently doing:

if (can1.read(canMsg)) {
  (fancy things)
}

and then could do something like checking a general :

if (can1.error) {
  (read deeper into the detailed errors and highlight to user)
}

Typically there is no access to the serial log in my usecase, its a automotive display, but it has a SD card.
I show the "CAN Status", currently it just shows if CAN traffic exists, or if the overrun happens.
That would then show a general "CAN Error" and i would dump more info to the logfile on the SD card.

Rate limiting the error messages should be my issue, i would possibly stop after a couple of seconds and ask the user to take action within my code.

By default this error reporting should be off, or that it must be actively checked.

Your example with clear text printing to the serial port would be a good example and starting point for users of this library!

You might get overruns depending on how busy your loop code is since you are polling frames. canMsg.overrun should return 1 or 0 if an overrun occured during that message reception, the errors are different than overrun

so you are looking for data storage, like a struct to hold the error data?
of course if there is no error in queue, only then should it check the ESR1 as alternative

Okay I developed an error queue system based on changes of the ESR1. The ISR will push to the queue when an error is likely to occur, and your loop can pull it out, read the data from a struct, and optionally, internally print it to serial monitor with a bool overload.

The way it is setup is so the errors will not be repetitive or queued endlessly, only if certain errors occur will it queue, so theres no issues running this in a loop without a timeframe, the queue is 16 messages deep

      CAN_error_t error; /* create a message struct to store the queue details, for data accesses */
      if ( Can1.error(error, 1) ) { // overload 1 means internally print everything in error event to serial monitor
        // Do Something...
        if ( error.ACK_ERR ) Serial.print("ACK error occured");
        if ( !strcmp(error.state, "Transmitting") ) Serial.println("TRANSMITTING!");
      }

Here is the output of me unplugging the ESP32 from Teensy and plugging back:

FlexCAN State: Transmitting, FLT_CONF: Error Active
FlexCAN State: Transmitting, ACK_ERR, FLT_CONF: Error Active
FlexCAN State: Transmitting, ACK_ERR, TX_WRN: 96, FLT_CONF: Error Active
FlexCAN State: Transmitting, ACK_ERR, TX_WRN: 128, FLT_CONF: Bus off
FlexCAN State: Transmitting, TX_WRN: 127, FLT_CONF: Error Active
FlexCAN State: Transmitting, FLT_CONF: Error Active

I've added the commits for error reporting, try it out :)
(me hot-plugging the esp32 and reconnecting)

FlexCAN State: Receiving, FLT_CONF: Error Active
FlexCAN State: Idle, FLT_CONF: Error Active
FlexCAN State: Transmitting, FLT_CONF: Error Active
FlexCAN State: Transmitting, ACK_ERR, FLT_CONF: Error Active
FlexCAN State: Transmitting, ACK_ERR, TX_WRN: 96, FLT_CONF: Error Active
FlexCAN State: Transmitting, ACK_ERR, TX_WRN: 104, FLT_CONF: Error Active
FlexCAN State: Transmitting, ACK_ERR, TX_WRN: 128, FLT_CONF: Bus off
FlexCAN State: Transmitting, TX_WRN: 127, FLT_CONF: Error Active
FlexCAN State: Transmitting, FLT_CONF: Error Active
FlexCAN State: Receiving, FLT_CONF: Error Active
FlexCAN State: Idle, FLT_CONF: Error Active
FlexCAN State: Transmitting, FLT_CONF: Error Active

this works great!

Testing it on a teensy 4 and unplugged the bus while running:

FlexCAN State: Receiving, BIT1_ERR, FRM_ERR, FLT_CONF: Error Active
FlexCAN State: Receiving, STF_ERR, FLT_CONF: Error Active
FlexCAN State: Receiving, BIT1_ERR, FRM_ERR, FLT_CONF: Error Active
FlexCAN State: Receiving, CRC_ERR, STF_ERR, FLT_CONF: Error Active
FlexCAN State: Receiving, BIT1_ERR, FRM_ERR, FLT_CONF: Error Active
FlexCAN State: Receiving, BIT1_ERR, FRM_ERR, RX_WRN: 97, FLT_CONF: Error Active
FlexCAN State: Receiving, BIT1_ERR, FRM_ERR, RX_WRN: 114, FLT_CONF: Error Active
FlexCAN State: Receiving, CRC_ERR, RX_WRN: 131, FLT_CONF: Bus off
FlexCAN State: Receiving, CRC_ERR, STF_ERR, RX_WRN: 131, FLT_CONF: Bus off
FlexCAN State: Receiving, CRC_ERR, RX_WRN: 131, FLT_CONF: Bus off
FlexCAN State: Receiving, CRC_ERR, FRM_ERR, RX_WRN: 131, FLT_CONF: Bus off
FlexCAN State: Receiving, CRC_ERR, RX_WRN: 131, FLT_CONF: Bus off
FlexCAN State: Receiving, CRC_ERR, STF_ERR, RX_WRN: 131, FLT_CONF: Bus off
FlexCAN State: Receiving, CRC_ERR, RX_WRN: 131, FLT_CONF: Bus off
FlexCAN State: Receiving, CRC_ERR, STF_ERR, RX_WRN: 131, FLT_CONF: Bus off
FlexCAN State: Receiving, CRC_ERR, RX_WRN: 131, FLT_CONF: Bus off
FlexCAN State: Receiving, CRC_ERR, FRM_ERR, RX_WRN: 131, FLT_CONF: Bus off
FlexCAN State: Receiving, CRC_ERR, RX_WRN: 131, FLT_CONF: Bus off

Thanks a lot for the really quick and great solution @tonton81 ๐Ÿ‘

If you go Buss off is there a way to reset the buss to see if this is an intermittent vs hardware issue? In other words a retry?

For example I can have an OBD buss that I am monitoring then some one plugs a cheap code reader in which take the buss down. When this is unplugged the buss can restart. I like to joke if you want to cause a lot of chaos then just take a paper clip and short out the OBD buss then you will find how robust the whole vehicle system is. For example the OBD bus is only dedicated to OBD or there is other control/monitoring traffic which is running on the buss also which will cause loss of communication between those modules. Thoughts? Thanks Bruce

you can surely short the bus, it wont break anything, the controller hardware does the bus-off recovery by iso standards when valid consecutive CAN frame bits are seen on the bus

you will get dash lights if you short bus of car due to systems communication loss, restarting car will get rid of most, but may require driving or reconnecting battery to restore normal operation.

@bvernham shorting out the CAN OBD on a relatively modern car (>2010) will only stop the OBD to work.
But if you would short out the Engine CAN on a modern VW/Audi then only a fault code reset will remove the dash lights, but the car works once the short is cleared. (guess from where i know...)

As @tonton81 said, the bus itself recovers with no need to actively reset/help.

For me this feature request here can be closed.

Ahh, but the assumption here is the OEM hid all the control busses and only expose the OBD buss to potentially poor external CAN devices.

I can tell you this is not a fact for all OEM's. Some have both OBD and inter-module CAN traffic on the same busses. Which to me is a no no but it is the fact. If you have 2 controllers that can not communicate with each other then for example an engine may go into a limp home mode with limited power/shifting and so on.

I am not saying this is the correct way to mechanize a vehicle CAN architecture but unfortunately some OEM's do.

If this T4.1 CAN controller will recover after the cause of the buss off issue is resolved then it is all good.

Thanks

Bruce

yes it auto recovers, it won't be iso standard if it wasn't. flexcan could, although it breaks standards, not exit from bus off, but requires a user to manually assert it to recover. this is not part of the library as it doesn't follow CAN device standards, and causes more code bloat and user involvement ๐Ÿ˜

As the request has been closed i close this Issue, thanks again @tonton81 ๐Ÿ‘