sw-tools/checkin-service

Puppeteer timeout on checkin

Rezer opened this issue · 3 comments

Rezer commented

I set this up recently and it has worked properly for one flight, but I just got this in my error log today (the flight checkin time is 1:35 UTC, so it woke up at the appropriate time):

START RequestId: a4483461-12d6-5170-b06f-6e900d55117c Version: $LATEST

2022-05-08T01:30:50.748Z	a4483461-12d6-5170-b06f-6e900d55117c	INFO	Received SQS message {
  reservation: {
    confirmation_number: 'XXXXX',
    first_name: 'Removed',
    last_name: 'Name'
  },
  checkin_available_epoch: 1651973700
}

2022-05-08T01:31:23.709Z	a4483461-12d6-5170-b06f-6e900d55117c	ERROR	TimeoutError: Navigation timeout of 30000 ms exceeded
    at /opt/nodejs/node_modules/puppeteer-core/lib/cjs/puppeteer/common/LifecycleWatcher.js:106:111

2022-05-08T01:31:23.749Z	a4483461-12d6-5170-b06f-6e900d55117c	ERROR	Invoke Error 	{
    "errorType": "TimeoutError",
    "errorMessage": "Navigation timeout of 30000 ms exceeded",
    "name": "TimeoutError",
    "stack": [
        "TimeoutError: Navigation timeout of 30000 ms exceeded",
        "    at /opt/nodejs/node_modules/puppeteer-core/lib/cjs/puppeteer/common/LifecycleWatcher.js:106:111"
    ]
}

END RequestId: a4483461-12d6-5170-b06f-6e900d55117c

REPORT RequestId: a4483461-12d6-5170-b06f-6e900d55117c	Duration: 33007.91 ms	Billed Duration: 33008 ms	Memory Size: 1600 MB	Max Memory Used: 529 MB	Init Duration: 562.47 ms	

Seems to be just an issue with the initial API token page not loading fast enough through puppeteer. I'm unsure if this means the API token page changed or if there just needs to be a retry configured for this part of the process.

Rezer commented

So I narrowed it down to the SwGenerateHeaders.generateHeaders() function timing out, and threw together a retry method for both basic and advanced headers within HandleScheduledCheckin. I managed to reproduce the error a few times, once again with the default 30 second timeout and for giggles one with a 10 second timeout. They each failed once then managed to continue on the second try. Here's a log of the updated behavior:

START RequestId: 8ba376f3-4c3d-568c-92d6-2c021cb71ea0 Version: $LATEST

2022-05-08T05:00:05.001Z	8ba376f3-4c3d-568c-92d6-2c021cb71ea0	INFO	Received SQS message {
  reservation: {
    confirmation_number: 'XXXXXX,
    first_name: 'Removed',
    last_name: 'Name'
  },
  checkin_available_epoch: 1651986000
}


2022-05-08T05:00:05.158Z	8ba376f3-4c3d-568c-92d6-2c021cb71ea0	DEBUG	basicHeaders {
  Host: 'mobile.southwest.com',
  'Content-Type': 'application/json',
  'X-API-Key': 'l7xx2c186c1297274b828b1872e22edfe67a',
  'X-User-Experience-Id': 'B99E5460-CE8B-11EC-8EF8-95E982AD8E52',
  Accept: '*/*',
  'X-Channel-ID': 'MWEB'
}

2022-05-08T05:00:38.652Z	8ba376f3-4c3d-568c-92d6-2c021cb71ea0	DEBUG	Failed getting advanced headers.  Retrying in 5s...
Exception details: TimeoutError: Navigation timeout of 30000 ms exceeded


2022-05-08T05:18:15.588Z	e21124ba-47da-5c2f-baef-7e4b137274de	DEBUG	advancedHeaders {
  'content-type': 'application/json',
  ...
  (execution continues normally from this point)

Unsure if it would be better to simply raise the timeout within generateHeaders() by using page.setDefaultNavigationTimeout(60000) or just keep relying on the retry to succeed, since it seemed to work on the second attempt even with a 10s timeout instead of 30.

I'm not exactly fluent in typescript but I'll submit a pull request with the changes I made.

Thanks for submitting this and your pull request!

This type of timeout stuff is sort of a pain point with my lack of understanding of Puppeteer. You'll see that I don't exactly know how to use Puppeteer properly by my random waitMs here, particularly in a Lambda environment where Puppeteer seems to behave a bit differently from locally. I think retrying is sort of a band-aid over whatever the problem is, but it seems like a sensible solution. If you make some small changes to your PR, I'll be happy to merge it in.

Rezer commented

Sorry kind of lost track of this. I've since come up with a better solution but I haven't had much chance to test it, given the somewhat unique set of requirements for testing this one. I'll see about cleaning it up and submitting another pull request soonish.