tessel/t2-firmware

Error: Received unexpected response with no commands pending

Closed this issue · 11 comments

@kevinmehall @johnnyman727 can either of you shed light on this error? I reviewed the commit that introduced it, but there is no explanation as to why this would ever happen. I'm suddenly and inexplicably experiencing hitting this error with a program that has:

  • Port A:
    • A repeating analog read on 7
    • A repeating I2C device read
  • Port B:
    • A repeating analog read on 7

When I eliminate the read on Port B, the issue seems to go away, but I'm not confident in the cause/effect. When I add logging to show the contents of the buffer when the exception is thrown, it is always a perfectly valid response. Here are samples from two runs that resulted in failure:

// A valid analog read response
Port:  A
Reply Buffer:  <Buffer 84 88 03>

// A valid I2C response
Port:  A
Reply Buffer:  <Buffer 84 50 54 90 7f cc b0 71 6c>

These are representative of all failures that I saw: always a valid response.


A bigger issue that this has brought to light is that the communication protocol is flawed in that there is no "pin identity" data in the response for analog reads.

I can't think of anything off of the top of my head. Do all of the pin.analogRead and i2c.read calls provide a callback?

Perhaps the socket in JS emits the readable event before the entire response from the MCU gets there? So then there would be multiple readable events and the replyQueue would get shifted more than once for that one response.

Do all of the pin.analogRead and i2c.read calls provide a callback?

Yes, the read operation is recursive, ie. approximately something like this:

function continuousAnalogRead(pin, callback) {
  tessel.port.A.analogRead(pin, (error, data) {
    callback(data);
    // Once the previous read is complete, request another one.
    // This is not great—Ideally I should be able to tell SAMD21 port 
    // that I want this pin to report its value at some frequency 
    continuousAnalogRead(pin, callback);
  });
}
continuousAnalogRead(7, data => runningAverage(data));

I strongly suspect it's the issue I put forth above:

Perhaps the socket in JS emits the readable event before the entire response from the MCU gets there? So then there would be multiple readable events and the replyQueue would get shifted more than once for that one response.

I might suggest checking the size of the incoming packet based on the command type (ie whether it's an analog read or i2c rx). If it's ever less than the expected size, I'm guessing that's the issue. Analog read will look like [24, BYTE_1, BYTE_2] and I2C read will look like [24, NUM_BYTES_READ...].

Perhaps the socket in JS emits the readable event before the entire response from the MCU gets there? So then there would be multiple readable events and the replyQueue would get shifted more than once for that one response.

It's possible that some aspect of this guess is applicable. The communication "protocol" as it stands is broken by design. There can't be a mix of variable length messages and fixed length messages without sentinel/signal values on both ends of the variable length message; data should be encoded so as to never conflict with signal bytes... I could go on, but there's no point because fixing this now would be extraordinarily hard.

I've spent the day with this issue and I have a fix, but I'm going to stop working and take it up again tomorrow.

Analog read will look like [24, BYTE_1, BYTE_2] and I2C read will look like [24, NUM_BYTES_READ...].

Yes, I'm aware—I don't use the tessel.port. ... apis throughout Tessel-IO, because I don't like that analogRead returns a percent and not the actual value (huge resolution loss)

I've changed gears on this... I happened to notice something that was worth further attention and discovered that this is being caused by duplicate messages being sent from the SAMD. Every failure I've observed can be traced to this. This actually bring me back to the issue I mentioned earlier:

A bigger issue that this has brought to light is that the communication protocol is flawed in that there is no "pin identity" data in the response for analog reads.

There's no way to filter an duplicate message without knowing which pin the read came from:

replyBuf@readable <Buffer >
available@readable <Buffer 84 7c 04 84 7c 04>
replyBuf before while:  <Buffer 84 7c 04 84 7c 04>
while:  0
port:  A
byte:  132
rply:  <Buffer 84 7c 04 84 7c 04>
A7 0.9249023437499999
while:  1
port:  A
byte:  132
rply:  <Buffer 84 7c 04>
FAILURE
port:  A
byte:  132
rply:  <Buffer 84 7c 04>
/usr/lib/node/tessel-export.js:287
          throw new Error('Received unexpected response with no commands pending: ' + byte);
          ^

Error: Received unexpected response with no commands pending: 132

I'm going to try something drastic: I'm going to remove the error and dump the bytes when that path is reached. I will test it by doing continuous reads as many pins as possible along with an I2C peripheral.

Each sent request should be paired up with exactly one response (except for asynchronous interrupts), so it identifies responses and their length by keeping the outstanding requests in a queue. So if you're receiving two responses without making two requests, you need to fix the firmware, not filter it in JS. The protocol is unfinished in plenty of other ways, but that's not one of them.

you need to fix the firmware, not filter it in JS

Yes, I agree, I didn't mean that we should fix this by "filtering" in JS, I meant that it would be useful to filter for the purpose of debugging the problem. I'm not reading one pin and observing one response, because the issue won't reproduce. Since I need several pins to be read, it would be useful to be able track which pins are associated with which failures, and identity information would be really useful in tracking failure patterns. Identity information is not unreasonable, but the intentional omission of identity information does seem unreasonable.

could go on, but there's no point because fixing this now would be extraordinarily hard.

@rwaldron what's the difficult part of this process? tessel-export.js is tied to the firmware version through t2 update so it's unlikely to cause compatibility errors.

so it's unlikely to cause compatibility errors.

I'm not concerned with compatibility errors, even though there would be direct impact on Tessel-IO, which by-passes the tessel.port.[A|B].pins.* APIs and talks directly with the port itself—but again, I'm not worried about resolving those issues expeditiously and would never let them be a "blocker".

Specifically, I think there should be no mixed handler mechanisms, ie. having some things handled by queued callbacks and some things handled by emitting an event is one too many handler mechanisms. Everything coming through the port should be handled by emitting an appropriate event. To prove out my thinking, I wrote a patch that moves handling for analog read responses out of the queued callback path and into an event emitting mechanism. It works, but it's only a partial migration at the moment. I'm going to pursue it when I have more time. Right now it's just a design exercise.

Other issues I encountered along the way:

  • The firmware is responding with duplicate messages
  • The firmware is responding with large swaths of zeroes

    dedicated specifically the queued callback is that there shouldn't be any mixed handler

Further observation:

  • Appears only when making "read" requests to both ports