yaacov/ArduinoModbusSlave

Character silent interval issue and Buffer overrun

mgmidura opened this issue · 7 comments

Running on SAMD21G18. I was encountering CRC errors randomly. After over a week of testing I convinced my self that it was not hardware and began noticing the bit timing on the modbus serial stream was violating the required quiet time. I began to look at the library. I believe that the _halfCharTimeInMicroSecond should be set to 50,000 for baud rates less than 19200. Once I made this change I have not seen any erros with over 116000 master slave transactions(16hrs running)

line 226 _halfCharTimeInMicroSecond = 5000000 / baudrate; // 0.5T.
Causes a truncation because value is too large for a uint16_t also I believe it is a typo because for 9600 baud it should be minimum of 3msec quiet time. Based on calculation on
line 230 _lastCommunicationTime = micros() + (_halfCharTimeInMicroSecond * MODBUS_FULL_SILENCE_MULTIPLIER);
the correct value should be 50000.

I also encountered random buffer overruns that would cause my program to lock up. I discovered that before transmit there was no check for this. I eliminated the lockups by adding check for transmission length greater than MODBUS_MAX_BUFFER
line 955 from if (length > 0) change to if ((length > 0) && (length < MODBUS_MAX_BUFFER))
line 981 from if (length > 0) change to if ((length > 0) && (length < MODBUS_MAX_BUFFER))

this eliminated any of my transmission overruns. I realize that the message response is lost in this case but it did not cause a issue for my application,

Thanks for your consideration.

@mgmidura
Hi Marty,
as I'm currently working on a fork of this library I was curious enough to take a look on your reported calculation issue ...

And I think the calculation is correct.
I you like, you may add debug printout to convince yourself *)

_halfCharTimeInMicroSecond = 5000000 / baudrate; // 0.5T.
Serial.print("hctms: ");
Serial.print(_halfCharTimeInMicroSecond);
Serial.print(" quiet: ");
Serial.println(_halfCharTimeInMicroSecond * MODBUS_FULL_SILENCE_MULTIPLIER);
and the outcome shows
hctms: 520 quiet: 3640

So the calculated quiet time of 3660 micros is imho correct.
3660 micros == 3.66 millis.

Regarding the mentioned uint16_t overflow, the result fits in this type (even @ 300 baud).
A constant in a formula does imho not necessarily need to fit in the same type as the result.
Maybe this is compiler dependent (I' m not a guru), at least it worked with AVR 2560 and ESP32 (Arduino IDE 1.8.19).

*) Debug printout is only shown when Serial is already initialized before the Constructor runs ...
Thus instead of the usual
Modbus slave(SERIAL_PORT, SLAVE_ID);
it is required to define only a pointer
Modbus *slave;
and create the object after Serial.begin();
slave = new Modbus(RS485_SERIAL, SLAVE_ID);

BTW:
I'm unsure if there are any timing constraints for the response message to be considered...
Is there also a kind of silence gap required? How fast do (automatic) RS485 adapters switch back to receive etc. ?
If the slave answers too fast and the RS485 is not yet ready to receive on master's end it may also cause corrupted messages.
I remember some similiar issue in the past, where I delayed the response message for a short moment, in order to get things working.

Regards
Werner

timing problems can happen due to interrupts. make sure that interrupts are short in execution and the code for checking the modbus read is called enough times.

You can try my fork.
I have eliminated the timing for push down the DE Pin.
I had no time to test the other functions in the fork but you can test it with this fix.
All slaves work now correctly for me with this fix.

Here is the descripton of the Issue Fix:
#https://github.com/device111/ArduinoModbusSlave/commit/47f5219a021bd66b5b650ba1aa42f92c30c65ebe
fix DE Pin timing:
Some masters send a new request immediately after receiving the response from the slave without waiting. Sometimes it can happen that the master is transmitting but the DE line of the slave is still HIGH. This is noticeable through sporadic timeouts.

Did not test the code,as I have currently no issues in this realm, just a comment ...

Not sure whether releasing the DE signal immediately is a (always) stable approach. I feel it's better to hold DE active for a short moment after the last bit was shifted out, just to make sure the bus is actively driven to a short silence.
With that in mind it makes imho some sense to hold the DE for half character time (like the original code).
According to Modbus standard the Master has to observe the inter frame delay anyway.
If it doesn't, it should imho better fixed at the Master side, because this bad behavior may hurt other Slaves on the bus too.
In my project I have several Slaves from different brands on the bus, so I feel it's better to stick to the standard behavior.

But it's interesting that 2 people fixed communication issues by changing this delay!
Makes me curious, and if I find the time, I will run a test with a DE pin setup (currently I have a automatic transceiver in place).

I can remember:
At that time I observed the DE pin with a signal logger and it turned out that the slave sometimes pulled the DE signal to LOW too late and the master was already sending. There I also observed that the "digitalWrite" from Arduino alone takes its time to switch. (A few microseconds).

Unfortunately, some Masters do not stick to the timing and start sending immediately after receiving the last byte.

I also checked other libraries and none of them wait after sending. (Not even in the original OpenModbus.org library.)

fact is:
In my opinion you can immediately pull down the DE signal of the slave because the master has to keep the timing of an half character after receiving the last byte and not the Slave.

Maybe to explain:
Master:
The master does not know the status of the DE pin and sends after it has received the last byte. (In the best case with a half character break).

Slave:
The program ensures that the last bit was actually sent by the slave. (Through Serial.flush() )
This ensures that everything has been sent.

Nice discussion ..., and I just like to point out / comment some details.

You are right, the DE on the slave can released very short after Serial.flush(), no need to adhere to some Modbus timings at this point.
It is imho just a matter of RS485 driver hardware and bus length how long the DE at the slave should held active after the last bit sent, to ensure a clean receive at the master's end (which was my point). Chances are that the normal software execution time for digitalWrite() is sufficient in all cases.
Considering that the DE timing is a matter of the RS485 'wire protocol' one may like to separate this from Modbus protocol timings.
On the other hand, as the standard request 3.5 character times delay before the next frame is put on the wire, there is no big waste 1/7 of this delay by holding the DE active.

You are also right that some master libraries do not wait when a application calls "send()".
Such behavior may be seen as bug or feature :-)
Seems that most libraries just focuses on encoding / decoding the RTU frames.
I guess, if the library API does not care of the timings by design, this topic is simply left to the user to implement/handle this in an appropriate way ...

As you observed:
" ... it turned out that the slave sometimes pulled the DE signal to LOW too late and the master was already sending."

Too late, or too early, a good question ...
You may also observed the opposite: Slave behaved OK, but Master was sending too early, before end of the silence period (as you mentioned master libs do not wait).
Depends if the DE was really active for too long, e.g. it was active still after the silence period?

Anyway, as the library is driven by periodic poll(), pause between calling poll() may potentially cause an unwanted stretching of the DE delay.
When I look on line 990 ff., it could happen that a poll() did just a _serialStream.flush();, but the timing condition to detect end of transmission is not yet met, and it would take a next poll() to accomplish this.
If this happens, the DE would be held active for a full poll() interval, which may be way too long ?!
Deleting the delay logic in the code definitely mitigates this kind of issue (if its there).

Seeing this, I'm happy currently using a ESP32 and automatic transceiver w/o the need of a DE pin :-)

Will have a closer look on this detail when I use the lib on a slow Arduino Mini etc, with an DE pin transceiver.
Thanks for the inspirations ...

Yes, nice discussion...:-)

Yes, but I only agree with you partially.
You wrote:

You are also right that some master libraries do not wait when a application calls "send()". Such behavior may be seen as bug or feature :-) Seems that most libraries just focuses on encoding / decoding the RTU frames. I guess, if the library API does not care of the timings by design, this topic is simply left to the user to implement/handle this in an appropriate way ...

and here we are right on topic.
The software shouldn't care about the DE timing as it is irrelevant to the protocol itself.
It's no different with your automatic transceiver.

And yes, you are right. The problem only occurs on slower processors. I have used an ATMEGA 64 where this was a big problem. With the ESP32 it was practically unnoticeable.

PS::
To sum it all up:
It is a master/slave protocol. The slave only responds to requests from the master.
The slave is only responsible for the timing after receiving from the master and BEFORE sending the reply, not after.

best regards,
Martin