Azure/azure-event-hubs-node

Client crashing with Cannot read property 'options' of undefined

MatejSkrbis opened this issue · 11 comments

After running for some time, the application randomly crashes with error:

TypeError: Cannot read property 'options' of undefined
at Object.translate (...\node_modules\rhea-promise\lib\eventContext.ts:97:49)
at emit (...\node_modules\rhea-promise\lib\util\utils.ts:184:20)
at Object.emitEvent (...\node_modules\rhea-promise\lib\util\utils.ts:195:5)
at Connection._connection.on (...\node_modules\rhea-promise\lib\connection.ts:506:10)
at emitOne (events.js:116:13)
at Connection.emit (events.js:211:7)
at Connection.dispatch (...\node_modules\rhea\lib\connection.js:229:37)
at Connection.input (...\node_modules\rhea\lib\connection.js:502:18)
at emitOne (events.js:116:13)
at TLSSocket.emit (events.js:211:7)

It seems as 'protocol_error' (eventName = 'protocolError') event is emitted and received at https://github.com/amqp/rhea-promise/blob/master/lib/connection.ts#L498
context is then set to:
{
message:"attach received on invalid channel 2"
name:"ProtocolError"
}

It seems it's much easier to reproduce this issue under high load when reading or writing from many partitions. With low load it can take days or more to get this error.

Package version:
"@azure/event-hubs": "1.0.7"

So I have fixed the options of undefined error in the latest version of rhea-promise (0.1.9). Source code change in the repo is over here.

What worries me is this error

message:"attach received on invalid channel 2"
name:"ProtocolError"

It seems the sdk is trying to attach to an already closed session. Can you provide me some error logs
by setting the DEBUG variable to:

export DEBUG=azure:event-hubs:error,azure-amqp-common:error,rhea-promise:error,rhea:events,rhea:frames,rhea:io,rhea:flow

That will provide me some context on the sequence of events that happened before this error occurred.

I'm not sure if I can provide this. What I'm actually doing is reading data from partition and processing it. If I find out that processing is too slow, I stop the receiver (haven't found any throttling options to slow down receive). And then when load is processed I start the receiver again.

When I stop the receiver the debug option seems to stop outputting to the console.

I also cannot reliably reproduce this issue (on one day it crashes almost every run on high load and on other day it doesn't crash at all. I have no idea why and code was not changed between these two days)

I will continue trying to get the sequence if there will be any luck getting it.

I left client running over weekend and found out this error when I came back.

Error: Unhandled "error" event. ([object Object])
    at Connection.emit (events.js:186:19)
    at emit (...\node_modules\rhea-promise\lib\util\utils.ts:183:20)
    at Object.emitEvent (...\node_modules\rhea-promise\lib\util\utils.ts:195:5)
    at Connection._connection.on (...\node_modules\rhea-promise\lib\connection.ts:505:9)
    at emitOne (events.js:116:13)
    at Connection.emit (events.js:211:7)
    at Connection.dispatch (...\node_modules\rhea\lib\connection.js:229:37)
    at Connection.input (...\node_modules\rhea\lib\connection.js:505:18)
    at emitOne (events.js:116:13)
    at TLSSocket.emit (events.js:211:7)

This error object also had context property with it's own stack

Error: transfer after detach
    at Incoming.on_transfer (...\node_modules\rhea\lib\session.js:360:15)
    at Session.on_transfer (...\node_modules\rhea\lib\session.js:730:19)
    at Connection.(anonymous function) [as on_transfer] (...\node_modules\rhea\lib\connection.js:734:30)
    at c.dispatch (...\node_modules\rhea\lib\types.js:909:33)
    at Transport.read (...\node_modules\rhea\lib\transport.js:108:36)
    at SaslClient.read (...\node_modules\rhea\lib\sasl.js:293:26)
    at Connection.input (...\node_modules\rhea\lib\connection.js:488:35)
    at emitOne (events.js:116:13)
    at TLSSocket.emit (events.js:211:7)
    at addChunk (_stream_readable.js:263:12)

I had env set to
debug=azure:event-hubs:error,azure-amqp-common:error,rhea-promise:error,rhea:events,rhea:frames,rhea:io,rhea:flow

But messages were printed to console only for first few minutes and then they stopped, so there is no any other useful information I can provide.

@MatejSkrbis - That is some useful information. Looks like sdk received some messages after the receiver link was detached. Hence we see this error.
Are you using the basic event hubs client or are you using the event processor host?
Will try to reproduce this error.
@grs - Any idea what can cause the transfer after detach error as mentioned above?

grs commented

@amarzavery it is raised when a transfer is received on a link that the peer already issued a detach for

@grs - To confirm my understanding of the above statement.

It looks like the service issued a detach for a receiver link and then sent a message on that link.

Is the above statement correct? If so, then this looks like a service side issue and I can start looping folks from the service team.

@amarzavery I am using event hubs client. Event processor host was less stable and was crashing much more often so I switched back to event hub clients.

grs commented

@amarzavery yes, that is what it looks like

@MatejSkrbis Have you see this "transfer after detach" error ever since? Since it was concluded that we might need to loop in the service team here, it would help if we know if this has happened again

I don't remember getting this error since I updated to "@azure/event-hubs": "1.0.8". I'm not sure if it was a rhea thing or not, but I am not getting this anymore.

Thanks @MatejSkrbis
I'll close this issue for now then.

If you see this error ("transfer after detach") again, please do log an issue.