njh/node-owfs

Incorrect decoding of 1-wire message

Closed this issue · 20 comments

Occasionally, requesting a temperature from a remote ds18b20, I see a result that is a string containing, in hex, stuff like:
30 0 0 0 0 0 0 0 c 0 0 0 c 0 0 0 0 0 0 c 0 0 0 0 33 37 2e 31 38 37 35
The sensor value is actually the bit on the end (37.1875) and the start appears to be from the 1-wire protocol header.
Having enabled debug for the convert module I see, leading up to the error:

owfs:convert Checking Header payload > 0 -1 +0ms
owfs:convert Checking Header payload > 0 12 +0ms
owfs:convert message to use [ { header:
{ version: 0,
payload: 12,
ret: 12,
controlflags: 32,
size: 12,
offset: 0 },
payload: ' 42.125' } ] +0ms
owfs:convert Received multiple messages in simple read [ { header:
{ version: 0,
payload: -1,
ret: 0,
controlflags: 0,
size: 0,
offset: 0 },
payload: '' },
{ header:
{ version: 0,
payload: 12,
ret: 12,
controlflags: 32,
size: 12,
offset: 0 },
payload: ' 21.125' } ] +10s
owfs:convert Checking Header payload > 0 -1 +0ms
owfs:convert Checking Header payload > 0 12 +0ms
owfs:convert message to use [ { header:
{ version: 0,
payload: 12,
ret: 12,
controlflags: 32,
size: 12,
offset: 0 },
payload: ' 21.125' } ] +0ms
owfs:convert message to use [ { header:
{ version: 0,
payload: -1,
ret: 0,
controlflags: 0,
size: 0,
offset: 0 },
payload: '\u0000\u0000\u0000\u0000\u0000\u0000\u0000\f\u0000\u0000\u0000\f\u0000\u0000\u0000 \u0000\u0000\u0000\f\u0000\u0000\u0000\u0000 42.5' } ] +124ms

I have caught another failure instance, with debug enabled in the communications module also. See attached file which contains a couple of good results followed by the failure.
owfs_error.txt

njh commented

The decoding of the binary protocol could certainly be improved. For example this:

var exp = new RegExp('[\u0000-\u001F]', 'g');
var lines = message.payload.replace(exp, '').split(' ');

It is good to have a real-case example of this.

I note that there is no checking of the actual length of the data received, possibly that is an issue. I suspect that the message is actually two messages, an empty one and the real one, and for some reason they are getting joined together so the second message ends up as the payload of the first. I have put some more debug in the comms module and am waiting for it to fail again.

I have added debug in communications.js socket.on('data'..) to show socket.bytesRead and data.length so the debug in there now reads

debug('Bytes read', socket.bytesRead);
debug('Data length', data.length);
debug('Receiving header', header);
debug('Receiving payload', payload);

The result is in the attached file, along with the preceding good transaction. It seems that often a response consists of two messages, one which is just a header and the other is the real message with payload. It seems clear that the failure occurs when these two are treated as one, so the header+payload of the second becomes the payload of the first. Unfortunately I don't know enough about how the socket i/f works to know whether they are coming from owserver like this or if the node is misinterpreting the data. Are you able to throw any light on this Nicholas? If there is some further debug that would be helpful let me know.

owfs_error2.txt

Should it actually be buffering all the chunks received by socket.on('data', ..) and parsing the whole buffer in socket.on('end',..)?

Following a discussion on the owfs mailing list I now understand what the headers with payload -1 are. They are 'holding' replies from the server to ensure the client does not timeout whilst waiting for data. The error happens when the empty header and data message are sent together, the existing code is incorrectly assuming that the data header and message are the payload for the dummy header. I am sure the correct approach is to buffer the chunks up and then parse the data, skipping the extra headers. If you are happy with that I will implement it.

njh commented

Yes, if you are happy to implement that change, that would be fantastic.

I am also hoping to improve the test coverage...

I have that working, how do I run the tests? I am running on ubuntu if that is relevant.

njh commented

The tests use eslint to check the code, do you have a config file for your preferred style rules?

Ah, ignore that, I see there is one in the project

Do the tests run on master for you? I am getting

./scripts/run-eslint.sh && nyc mocha test/*.js

node.js version 4.8.0
events.js:141
throw er; // Unhandled 'error' event
^

Error: spawn mocha ENOENT
at exports._errnoException (util.js:907:11)
at Process.ChildProcess._handle.onexit (internal/child_process.js:189:32)
at onErrorNT (internal/child_process.js:355:16)
at nextTickCallbackWith2Args (node.js:511:9)
at process._tickCallback (node.js:425:17)
at Function.Module.runMain (module.js:443:11)
at startup (node.js:140:18)
at node.js:1043:3
npm ERR! Test failed. See above for more details.

Its ok, I didn't have mocha (or sinon) installed that it needs. All passing now.

njh commented

Tests also run automatically by Travis. eg. Node 4:
https://travis-ci.org/njh/node-owfs/jobs/193196837

njh commented

Thanks for the fix in #27 Colin.

I will write some tests and then close this issue.

njh commented

Still working on this. Writing tests for this is proving to be hard work.

njh commented

Test finally written!

6db0152

njh commented

I think some of the code in convert.js can now be deleted. From looking at one of the tests, it looks like they were trying to work round the bug you fixed:

https://github.com/njh/node-owfs/blob/master/test/listingTest.js#L7

I am not sure, do you mean the 'Received multiple messages in simple read' bit? I think that is still necessary, my code does not remove the empty 'holding' messages, it just packs them into the messages array correctly. I think the code in convert.js then finds the actual message in the array. With advancing years, though, I find it more difficult to remember how stuff that I previously understood works, so I would have to spend some time looking at my notes and getting back into the code again to be certain.

njh commented

Released version 0.3.1 which contains this fix.