orliesaurus/nodemailer-mailgun-transport

Econn Reset and socket hang ups

Closed this issue · 5 comments

We've been running against mailgun with a nodejs microservice that uses nodemailer-mailgun-transport for well over a year. But since the past few days I'm seeing a lot of ECONN_RESET and Socket hang ups on the connections going to mailgun.

Is there anything that can possibly be done inside of here to prevent connection issues and socket hang ups from bubbling up the dep chain? AKA handle it internally in the package, or do we need to handle mailgun connection issues ourselves?

(ps: no changes were done on our side)

@peterver this could possibly be a routing issue between your service and the mailgun servers, do you have any more information you could share to understand if it's something caused by our library?

@orliesaurus this is what I'm seeing inside of our K8s logs for our mailer microservice:

{ Error: socket hang up
    at createHangUpError (_http_client.js:323:15)
    at Socket.socketOnEnd (_http_client.js:426:23)
    at Socket.emit (events.js:194:15)
    at endReadableNT (_stream_readable.js:1125:12)
    at process._tickCallback (internal/process/next_tick.js:63:19) code: 'ECONNRESET' }
{ Error: socket hang up
    at createHangUpError (_http_client.js:323:15)
    at Socket.socketOnEnd (_http_client.js:426:23)
    at Socket.emit (events.js:194:15)
    at endReadableNT (_stream_readable.js:1125:12)
    at process._tickCallback (internal/process/next_tick.js:63:19) code: 'ECONNRESET' }
{ Error: connect ECONNREFUSED xx.xx.xx.xx:80
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1097:14)
  errno: 'ECONNREFUSED',
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: 'xx.xx.xx.xx',
  port: 80 }
{ Error: socket hang up
    at createHangUpError (_http_client.js:323:15)
    at Socket.socketOnEnd (_http_client.js:426:23)
    at Socket.emit (events.js:194:15)
    at endReadableNT (_stream_readable.js:1125:12)
    at process._tickCallback (internal/process/next_tick.js:63:19) code: 'ECONNRESET' }
{ Error: connect ECONNREFUSED xx.xx.xx.xx:80
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1097:14)
  errno: 'ECONNREFUSED',
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: 'xx.xx.xx.xx',
  port: 80 }
{ Error: connect ECONNREFUSED xx.xx.xx.xx:80
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1097:14)
  errno: 'ECONNREFUSED',
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: 'xx.xx.xx.xx',
  port: 80 }

(IP addresses redacted)

The official response from Mailgun was the following:

From the behavior you're experiencing, it sounds like you may be experiencing DNS TTL issues when resolving our hostname's IPs. With AWS we use rotating resource IPs to load balance connections to our servers. If the A record is cached for too long, then your application may attempt to connect to an old IP not currently in rotation.

Which doesn't really say a lot as it's supposed to be a reliable delivery service, how reliable can the service be if they don't handle A record caching properly on their side? Is there anything that can be done in here to handle this case?

Out of a batch of approx 1000 mails, less than 30% get succesfully delivered since a couple of days:

mailgun5

Hi Peter!
I feel the frustration..You should hammer their support team and link them to this thread.
We can't do anything from our side. This is something that needs to be escalated with their devops/networking team and not really something we, community and users, can do on their behalf :(
You can do a public Request for Help on Twitter, or ping @jrodom directly

Hello @peterver have you had any luck in solving your issue, feel free to re-open if you haven't figured it out.