Azure/azure-event-hubs-node

EventProcessorHost - Multiple runtimes for AzureWebapp Timeouts

jzwack opened this issue · 11 comments

I'm running EPH in an Azure WebApp (with express as the frontend) and reading EventHub using EPH. Now normally in Azure, one nodejs process runs for each core on the machine. This means only one of the processes is picking up the leases and reading events.

For the instances that aren't reading events, I'm eventually getting this timeout unhandled exceptions that causes the entire application to bounce:
"The connection was inactive for more than the allowed 300000 milliseconds and is closed by container 'LinkTracker'"

This feels like #127 but that issue is for the other SDK.

@jzwack - Thanks for reporting the issue. The EPH depends on the event-hubs sdk to receive messages from the EventHub. I was able to reproduce the issue. Will be fixing it shortly.

@jzwack - The issue has been fixed. The azure-event-processor-host depends on azure-event-hubs: "^0.2.4".

If you do a clean install (remove node_modules) and npm install azure-event-processor-host again, it should pull in the latest azure-event-hubs package. To verify, after the installation is complete you can npm list azure-event-hubs and it should tell that the version is 0.2.8.

Feel free to reopen the issue, if the problem persists.
Thank you once again for filing the issue and help us make this a better product :)

@jzwack - Please install the @azure/event-processor-host from npm. That is the stable version of EPH.

@amarzavery Got it, it's working great! Thank you for the help.

@jzwack -
Just out of curiosity, wanted to know what is your logic for checkpointing messages? I mean, do you checkpoint every 100th message or checkpoint every n seconds, or there is a different logic?

Also, are you running on a Linux App Service or a Windows App Service?

@amarzavery Windows App Service on node v8.9.3. I was checkpointing every 5000 messages, I tried playing around with reducing it (say 2000) but ended up getting more "lease lost" errors. I'm digging into why.

Hmm, actually now that I look at it more, It looks like I'm reprocessing lots of messages even though I'm checkpointing since updating to the new package.

How many eph instances are you creating and what is the number of partitions in the EventHub?

I have made some updates to check pointing in version 1.0.2. Please try the new version.

If message processing in the onMessage handler is taking more time, then you can increase the leaseDuration and the leaseRenewInterval by setting those options while creating the event processor host. Default value is 30 and 10 seconds respectively. You may want to try 60 and 30 seconds respectively or fine tune based on your usage, or the time it takes to process messages.

Just one eph instance, two partitions. I probably will have more partitions in production event hub.

I'm receiving ~80 messages/sec right now. I just updated it now to 1.0.2 and I'll let you know.

I'll see on onMessage timing, the messages themselves are very small, and I'm doing some non-blocking processing (sending to a 3rd party, and database lookups).

@amarzavery This is working well now, I also had to bump the specs of the WebApp Service Plan, I was hitting 90%+ memory. Making the interval/duration changes you suggested along with that and it's been running without exception for two days.