RTU over TCP error: response data size '5' does not match count '4'
branimirborisov opened this issue · 37 comments
I'm trying to read a RTU device routed over TCP using both the library (via InfluxDB Telegraf) and the modbus-cli
tool. In both cases trying to read the device results in the following error:
modbus: response data size '5' does not match count '4'
Cli command is:
./modbus-cli.amd64 -address tcp://{HOST}:{PORT} -register 40807 -quantity 2 -slaveID 1
When switching to RTUoverTCP I get the following timeout error:
read tcp 172.24.0.3:52840->{HOST}:{PORT}: i/o timeout
Device is SIEMENS SICAM P with Modbus RTU translated over TCP.
Note: When using another tool, for example modpoll
register value is parsed correctly, e.g.:
modpoll -p ${MODBUS_PORT} -a 1 -f -t 4:float -r 40807 ${MODBUS_HOST}
results in:
[40807]: 213209248.000000
...
Is there some additional parameter I'm missing that needs to be specified to parse correctly the value or is it some issue with the library implementation and the device compatibility?
Thanks!
The origin of the error is here:
Line 125 in c4a3d04
It seems like the device is sending more data then expected. You might want to soften the constrains in L124 to
if count <= length {
and do
if count <= length {
err = fmt.Errorf("modbus: response data size '%v' does not match count '%v'", length, count)
return
}
results = response.Data[1:count+1]
That sounds like bad advise to me- you'd never know which parts of the response are the correct ones.
From the description it is not entirely clear if the device needs be addressed using Modbus RTU or TCP. Let's assume RTU. The first error is clear then- addressed as TCP the device might respond with anything, having misunderstood the request. As it doesn't match the expected TCP structure that will error.
It's unclear though, why the second request returns i/o timeout which is before Modbus (any kind of it) is even spoken.
@andig The device should be address via TCP. Its actually RTU device using a TCP converter. So at the end we should address it as TCP device.
Also as I have described when using another tool - modpoll
and addressing the device as TCP I'm able to read the data. So I would also suggest to look into the TCP implementation and see if we can also cover this case.
What if we follow the suggested approach from @srebhan and skip the check or if its actually needed for some cases to introduce a option/flag that will allow us to toggle it on/off? This way it can be enabled by default, but I would be able to turn it off and use it with the particular device?
Just to be sure because this can be tricky. Are you really sure that you have an RTU to TCP protocol converter? Not just an RS485 to Ethernet adapter (which would expose RTU over TCP)?
It still doesn‘t figure that in the RTU case you‘ve tried the port couldn‘t be connected to either.
In any case, if application wants x bytes, y bytes response should be an error (opened #53 based on such an experience).
Regarding the precise response length, I agree with @andig. But before we start tweaking the software, lets collect some more details.
@branimirborisov could you compile and test a modbus-cli based on the latest master commit (0daecbb)? Just to make sure that this problem still exists.
if its still an issue, could you run the modbus-cli command again and append -log-frame
to collect more details about the transmitted data?
Usage of ./modbus-cli:
-log-frame
prints received and send modbus frame to stdout
Next we should try the same with modpoll to compare sent and received data. Since I have never used modpoll, I do not know how to record them. But maybe -v
is worth a try.
usage : mbpoll [ options ] device|host [ writevalues... ] [ options ]
ModBus Master Simulator. It allows to read and write in ModBus slave registers
connected by serial (RTU only) or TCP.
-v Verbose mode. Causes mbpoll to print debugging messages about
its progress. This is helpful in debugging connection...
Update:
If modpoll -v
does not provide any details, we could record[0] them via tcpdump.
0: https://www.plcnext-community.net/makersblog/network-record-with-tcpdump/
@frzifus i'm trying to build the modbus-cli form the latest version in master branch, but I'm getting following error:
cmd/modbus-cli/main.go:285:11: cannot use "github.com/grid-x/serial".RS485Config literal (type "github.com/grid-x/serial".RS485Config) as type "github.com/grid-x/modbus/vendor/github.com/grid-x/serial".RS485Config in assignment
Maybe I'm doing something wrong? Can you provide some instructions on how to build the cli from master? I'm on Ubuntu 18.04 with go 1.10.4?
Your go is ancient. Upgrade to a version from this century ;)
@andig @frzifus Running the modbus-cli from master with the -log-frame
option results in:
./main -address tcp://{HOST}:{PORT} -register 40807 -quantity 2 -slaveID 1 -log-frame
modbus: send 00 01 00 00 00 06 01 03 9f 67 00 02
modbus: recv 00 01 00 00 00 08 01 03 04 4d 6b ca 16 4b
modbus: response data size '5' does not match count '4'
Any suggestions? Thank you!
So what's the mbpoll output in comparison?
So what's the mbpoll output in comparison?
-- Polling slave... (Ctrl-C to stop)
[40807]: 247596832.000000
-- Polling slave... (Ctrl-C to stop)
[40807]: 247597152.000000
-- Polling slave... (Ctrl-C to stop)
[40807]: 247597152.000000
-- Polling slave... (Ctrl-C to stop)
[40807]: 247597152.000000
-- Polling slave... (Ctrl-C to stop)
[40807]: 247597472.000000
-- Polling slave... (Ctrl-C to stop)
Which corresponds to the first four bytes in the response as IEEE float.
Would be nice to get a more detailed view of your mbpoll request first. have you tried to capture the TCP packets which include modbus send
and recv
payload?
@frzifus Unfortunately I'm not able to find a way to capture the packets with the tool. Currently we are testing another python based library to read the data. I'll will try to post the results here once we have verified its working.
Could you drop your capture file here?
@frzifus This is a capture from modpoll with tcpdump:
17:32:13.121100 IP 127.0.0.1.53212 > {MODBUS_HOST}.3351: Flags [P.], seq 320494278:320494290, ack 184513, win 63456, length 12
0x0000: 4500 0034 d24e 4000 4006 7bc3 c0a8 6408 E..4.N@.@.{...d.
0x0010: 596a 6e97 cfdc 0d17 131a 5ac6 0002 d0c1 Yjn.......Z.....
0x0020: 5018 f7e0 0eeb 0000 0039 0000 0006 0103 P........9......
0x0030: 9f66 0002 .f..
17:32:13.157926 IP {MODBUS_HOST}.3351 > 127.0.0.1.53212: Flags [P.], seq 184513:184527, ack 320494290, win 2236, length 14
0x0000: 4500 0036 1f75 0000 f806 b69a 596a 6e97 E..6.u......Yjn.
0x0010: c0a8 6408 0d17 cfdc 0002 d0c1 131a 5ad2 ..d...........Z.
0x0020: 5018 08bc ff35 0000 0039 0000 0008 0103 P....5...9......
0x0030: 044d 8986 105f .M..._
17:32:13.157966 IP 127.0.0.1.53212 > {MODBUS_HOST}.3351: Flags [.], ack 184527, win 63442, length 0
0x0000: 4500 0028 d24f 4000 4006 7bce c0a8 6408 E..(.O@.@.{...d.
0x0010: 596a 6e97 cfdc 0d17 131a 5ad2 0002 d0cf Yjn.......Z.....
0x0020: 5010 f7d2 af9d 0000 P.......
17:32:14.121299 IP 127.0.0.1.53212 > {MODBUS_HOST}.3351: Flags [P.], seq 320494290:320494302, ack 184527, win 63442, length 12
0x0000: 4500 0034 d250 4000 4006 7bc1 c0a8 6408 E..4.P@.@.{...d.
0x0010: 596a 6e97 cfdc 0d17 131a 5ad2 0002 d0cf Yjn.......Z.....
0x0020: 5018 f7d2 0ede 0000 003a 0000 0006 0103 P........:......
0x0030: 9f66 0002 .f..
17:32:14.161862 IP {MODBUS_HOST}.3351 > 127.0.0.1.53212: Flags [P.], seq 184527:184541, ack 320494302, win 2224, length 14
0x0000: 4500 0036 1f76 0000 f806 b699 596a 6e97 E..6.v......Yjn.
0x0010: c0a8 6408 0d17 cfdc 0002 d0cf 131a 5ade ..d...........Z.
0x0020: 5018 08b0 fc66 0000 003a 0000 0008 0103 P....f...:......
0x0030: 044d 8986 131f .M....
17:32:14.161902 IP 127.0.0.1.53212 > {MODBUS_HOST}.3351: Flags [.], ack 184541, win 63428, length 0
0x0000: 4500 0028 d251 4000 4006 7bcc c0a8 6408 E..(.Q@.@.{...d.
0x0010: 596a 6e97 cfdc 0d17 131a 5ade 0002 d0dd Yjn.......Z.....
perfect, I think that's the response => ...0008 0103 044d 8986 105f...
I'll try to have a more detailed look by the end of the week.
@branimirborisov looking at the raw bytes. Both requests and responses look fine. The only thing that is noticeable, you requested a different register. maybe its caused by the modbus-cli
wrapper?
Maybe the other device is wrong? We expect 4 bytes in both responses. But we received in both cases 5. See response -> unknown.
Raw
Response:
our-lib => 03 04 4d 6b ca 16 4b
modpoll => 03 04 4d 89 86 10 5f
Request
FunctionCode
got 0x03
, thats fine details.
The Data Address of the first register requested.
This seems to be different (our-lib) 9f67
(modpoll) 9f66
.
The total number of registers requested.
In our case both is 2. => 0002
Response
The number of data bytes to follow (2 registers x 2 bytes each = 4 bytes)
In our-lib and modpoll we receive 0x04
. Thats correct.
payload
Seems to be fine too. In both cases, its the same length as expected.
our-lib => 4d6b ca16
modpoll => 4d89 8610
Unknown
our-lib => 4b
modpoll => 5f
@frzifus right, the last "unknown" byte is the key. While modpoll seems to just ignore this byte, this-lib errors-out on the extra byte.
Thanks for looking into the details @frzifus . But if the response is OK why would the library throw an exception?
But if the response is OK why would the library throw an exception?
No, the response is invalid. The given byte length doesnt match.
How about providing an option (not by default, but by additional configuration for example) to ignore it instead of throwing the exception just to cover the case?
One could return the data plus error. Making the lib accept invalid stuff without error for sake of a broken device doesn‘t sound desirable.
@andig @frzifus The device is behind an RTU to TCP converter - the problem might be due to the converter not to the device itself. If I would like to read/parse the first 8 bytes, no matter of the response size , which might be a side effect of the converter, why wouldn't this be allowed if at the end the data is valid?
Finally its your call how to tackle this issue, just let me know, so we can focus our efforts elsewhere.
Thank you both for the support so far!
@andig your proposal would work from Telegraf side if we can identify that too many bytes were received (e.g. by having an error class to check against).
Thanks for investigating @frzifus @branimirborisov. So what I can tell from the trace, actually the "outer" frame length seems to be correct:
0039 // transaction id
0000 // "protocol identifier" (constant 0)
0008 // remaining bytes in the frame
0103 044d 8986 105f // indeed 8 bytes remaining
But the inner length which is sent again as part of the function response is too short:
01 // unit ID
03 // function code
04 // remaining bytes in response
4d 8986 105f // 5 bytes!
So I'd argue that if the outer length matches and there's no underflow, we should just discard the rest and only return the 4 bytes. After all, the response header tells us that only 4 bytes are relevant. Maybe the peer has some implementation reason why it does this (e.g. all buffers must be even-sized, or it needs to reuse fixed size buffers. I'll try to find out what the spec actually says about this.
Is there any progress on this issue?
So I'd argue that if the outer length matches and there's no underflow, we should just discard the rest and only return the 4 bytes. After all, the response header tells us that only 4 bytes are relevant. Maybe the peer has some implementation reason why it does this (e.g. all buffers must be even-sized, or it needs to reuse fixed size buffers. I'll try to find out what the spec actually says about this.
Imho the response is just invalid and the error expected. Fine as-is.
So the user is left with a inaccessible device? I thought the conclusion by @guelfey
So I'd argue that if the outer length matches and there's no underflow, we should just discard the rest and only return the 4 bytes. After all, the response header tells us that only 4 bytes are relevant. Maybe the peer has some implementation reason why it does this (e.g. all buffers must be even-sized, or it needs to reuse fixed size buffers. I'll try to find out what the spec actually says about this.
suggests that the response can be used nonetheless!?!
Which 4 of the 5 bytes would you even consider „the response“?
If you decide for the first 4 which is probably a little less arbitrary a choice, #52 (comment) still sounds like a good idea to me.
IMO you should use the first 4 bytes as this is the most natural interpretation. The other option would be to let the user decide, e.g. you add an option
varadic to specify what to do in the discussed case!? This would be the best option from Telegraf side as we can just pass through this option to the config....
We have the same problem in evcc-io/evcc#6051 (reply in thread). In our case, the server returns less data than expected. This leads to client using the result as part of slice access which panics. Instead of checking this at every return site it would be nice to check as part of the reading operation.
Current code does already check if response in itself is consistent, e.g. for holding:
count := int(response.Data[0])
length := len(response.Data) - 1
if count != length {
err = fmt.Errorf("modbus: response data size '%v' does not match count '%v'", length, count)
return
}
In addition, it should check if response matches request:
if 2*int(quantity) != length {
err = fmt.Errorf("modbus: response data size '%v' does not match request '%v'", length, 2*quantity)
}
It should still return the result:
results = response.Data[1:]
return
Improvement would be to define a custom error type so it can be checked with errors.Is()
.
That would work for me.
Upstream PR: grid-x/modbus#81