aws lambda support?
johnHastings772 opened this issue · 5 comments
I guess it's unsupported right now but... would it be possible to run secret-agent on aws lambda with or without any limitations?
I'm trying to get it running but I keep getting PipeTransport errors on the real aws enviroment while on a local invocation works just fine.
This is an excerpt of the log:
`2022-01-28T11:42:40.802Z ERROR [/var/task/node_modules/@secret-agent/puppet/lib/PipeTransport] PipeTransport.WriteError { context: {}, sessionId: �[1mnull�[22m, sessionName: �[90mundefined�[39m } Error: read ECONNRESET
�[90m at Pipe.onStreamRead (internal/stream_base_commons.js:209:20)�[39m {
errno: �[33m-104�[39m,
code: �[32m'ECONNRESET'�[39m,
syscall: �[32m'read'�[39m
}
2022-01-28T11:42:40.803Z ERROR [/var/task/node_modules/@secret-agent/puppet/lib/PipeTransport] PipeTransport.WriteError { context: {}, sessionId: �[1mnull�[22m, sessionName: �[90mundefined�[39m } Error: read ECONNRESET
�[90m at Pipe.onStreamRead (internal/stream_base_commons.js:209:20)�[39m {
errno: �[33m-104�[39m,
code: �[32m'ECONNRESET'�[39m,
syscall: �[32m'read'�[39m
}
2022-01-28T11:42:40.804Z INFO [/var/task/node_modules/@secret-agent/puppet/lib/PipeTransport] PipeTransport.Closed { context: {} }
2022-01-28T11:42:40.820Z STATS [/var/task/node_modules/@secret-agent/puppet/lib/BrowserProcess] chrome.ProcessExited { context: {} }
2022-01-28T11:42:40.821Z INFO [/var/task/node_modules/@secret-agent/core/lib/Session] Session.Closing { context: {} }`
Looking at the stacktrace it seems that there's some problem while launching chrome and thus the connection is closed.
I definitely want to support this, but this is going to take some effort to troubleshoot. I think we need to figure out what isn't working on Lambda. I'm guessing Lambda is balking either at the size of Chrome, or the architecture on the machine is somehow incompatible with the Chrome builds we have.
I think the starting place is to look at how Playwright and Puppeteer are getting running on AWS. You can change out the "Chrome" that SecretAgent is using as a temporary measure by using the environment variable CHROME_88_BIN
to point at your executable.
Hi! I tried with latest chomium 88 binary from https://commondatastorage.googleapis.com/chromium-browser-snapshots/index.html?prefix=Linux/ and as before it works just fine on local but on lambda I got an error, a different one this time:
On the lambda call where the exception error is logged:
{
"message": "CoreServer needs further setup to launch the browserEmulator. See server logs.",
"name": "Error",
"stack": "Error: CoreServer needs further setup to launch the browserEmulator. See server logs.\n at ConnectionToClient.serializeError (/var/task/node_modules/core/server/ConnectionToClient.ts:372:14)\n at ConnectionToClient.handleRequest (/var/task/node_modules/core/server/ConnectionToClient.ts:94:19)\n------REMOTE CORE---------------------------------\n at Function.reviver (/var/task/node_modules/commons/TypeSerializer.ts:208:26)\n at JSON.parse ()\n at Function.parse (/var/task/node_modules/commons/TypeSerializer.ts:24:17)\n at WebSocket. (/var/task/node_modules/client/connections/RemoteConnectionToCore.ts:67:42)\n at WebSocket.emit (events.js:400:28)\n at WebSocket.emit (domain.js:475:12)\n at Receiver.receiverOnMessage (/var/task/node_modules/ws/lib/websocket.js:1008:20)\n at Receiver.emit (events.js:400:28)\n at Receiver.emit (domain.js:475:12)\n at Receiver.dataMessage (/var/task/node_modules/ws/lib/receiver.js:517:14)\n------CONNECTION----------------------------------\n at new Resolvable (/var/task/node_modules/commons/Resolvable.ts:16:18)\n at Object.createPromise (/var/task/node_modules/commons/utils.ts:68:10)\n at RemoteConnectionToCore.createPendingResult (/var/task/node_modules/client/connections/ConnectionToCore.ts:357:31)\n at RemoteConnectionToCore.internalSendRequestAndWait (/var/task/node_modules/client/connections/ConnectionToCore.ts:263:43)\n at RemoteConnectionToCore.sendRequest (/var/task/node_modules/client/connections/ConnectionToCore.ts:164:17)\n at processTicksAndRejections (internal/process/task_queues.js:95:5)\n at CoreCommandQueue.sendRequest (/var/task/node_modules/client/lib/CoreCommandQueue.ts:150:22)\n at Object.cb (/var/task/node_modules/client/lib/CoreCommandQueue.ts:117:16)\n at Queue.next (/var/task/node_modules/commons/Queue.ts:82:19)\n------CORE COMMANDS-------------------------------\n at Queue.run (/var/task/node_modules/commons/Queue.ts:35:19)\n at CoreCommandQueue.run (/var/task/node_modules/client/lib/CoreCommandQueue.ts:114:8)\n at RemoteConnectionToCore.createSession (/var/task/node_modules/client/connections/ConnectionToCore.ts:205:51)\n at SessionConnection.getCoreSessionOrReject (/var/task/node_modules/client/lib/Agent.ts:559:36)\n at SessionConnection.get activeTab [as activeTab] (/var/task/node_modules/client/lib/Agent.ts:480:10)\n at Agent.get activeTab [as activeTab] (/var/task/node_modules/client/lib/Agent.ts:152:37)\n at Agent.goto (/var/task/node_modules/client/lib/Agent.ts:365:17)\n at Runtime.exports.handler (/var/task/app.js:41:42)\n at Runtime.handleOnce (/var/runtime/Runtime.js:66:25)"
}
On the cloundfront log
2022-01-31T08:46:02.986Z STATS [/var/task/node_modules/@secret-agent/puppet/index] Puppet.LaunchError { context: {} } [PuppetLaunchError: EROFS: read-only file system, open '/opt/chrome/.validated' Error: EROFS: read-only file system, open '/opt/chrome/.validated'] { isSandboxError: false } 2022-01-31T08:46:02.987Z ERROR [/var/task/node_modules/@secret-agent/core/server/ConnectionToClient] ConnectionToClient.HandleRequestError { context: {}, sessionId: undefined, sessionName: undefined } [PuppetLaunchError: EROFS: read-only file system, open '/opt/chrome/.validated' Error: EROFS: read-only file system, open '/opt/chrome/.validated'] { isSandboxError: false
This two flags are being used to launch chrome:
'--homedir=/tmp',
'--user-data-dir=/tmp/chrome-user-data',
I have no idea where that .validated dir comes from, is it something custom from the browser emulator?
Edit So I noticed that .validated "dir" is just an empty file on a default secret-agent install I touched it and now it loads but the launch args are being ignored, which seems quite strange.
Edit2 I see the .validated file flags if the current browser dependencies are met.
I tried to load a different chromium 88 binary, this time the build number was closest to the default one in secret agent, the build was so "old" that it used gtk2 and thus different dependencies were needed I had to resort to symlinking the bzip2 .so because the build was expecting libbzip2.1.0 instead onf libbzip2.1 or libbzip2.1.06 which was what amazon linux bzip2 package provides. This time the browser loaded but failed with the error: Gtk-WARNING **: 14:39:47.950: cannot open display: ' this was in the local enviroment I also used the --disable-gpu and --headless launch args for chrome
Edit 3 Tried with the binary dev-headless-chromium-88.0.4298.4-amazonlinux-2017-03.zip from:
https://github.com/adieuadieu/serverless-chrome/releases
And got this error on aws lambda, works just fine in the docker container locally:
(/var/task/node_modules/core/server/ConnectionToClient.ts:372:14)\n at ConnectionToClient.handleRequest (/var/task/node_modules/core/server/ConnectionToClient.ts:94:19)\n at processTicksAndRejections (internal/process/task_queues.js:95:5)\n------REMOTE CORE---------------------------------\n at Function.reviver (/var/task/node_modules/commons/TypeSerializer.ts:208:26)\n at JSON.parse (<anonymous>)\n at Function.parse (/var/task/node_modules/commons/TypeSerializer.ts:24:17)\n at WebSocket.<anonymous> (/var/task/node_modules/client/connections/RemoteConnectionToCore.ts:67:42)\n at WebSocket.emit (events.js:400:28)\n at WebSocket.emit (domain.js:475:12)\n at Receiver.receiverOnMessage (/var/task/node_modules/ws/lib/websocket.js:1008:20)\n at Receiver.emit (events.js:400:28)\n at Receiver.emit (domain.js:475:12)\n at Receiver.dataMessage (/var/task/node_modules/ws/lib/receiver.js:517:14)\n------CONNECTION----------------------------------\n at new Resolvable (/var/task/node_modules/commons/Resolvable.ts:16:18)\n at Object.createPromise (/var/task/node_modules/commons/utils.ts:68:10)\n at RemoteConnectionToCore.createPendingResult (/var/task/node_modules/client/connections/ConnectionToCore.ts:357:31)\n at RemoteConnectionToCore.internalSendRequestAndWait (/var/task/node_modules/client/connections/ConnectionToCore.ts:263:43)\n at RemoteConnectionToCore.sendRequest (/var/task/node_modules/client/connections/ConnectionToCore.ts:164:17)\n at processTicksAndRejections (internal/process/task_queues.js:95:5)\n at CoreCommandQueue.sendRequest (/var/task/node_modules/client/lib/CoreCommandQueue.ts:150:22)\n at Object.cb (/var/task/node_modules/client/lib/CoreCommandQueue.ts:117:16)\n at Queue.next (/var/task/node_modules/commons/Queue.ts:82:19)\n------CORE COMMANDS-------------------------------\n at Queue.run (/var/task/node_modules/commons/Queue.ts:35:19)\n at CoreCommandQueue.run (/var/task/node_modules/client/lib/CoreCommandQueue.ts:114:8)\n at RemoteConnectionToCore.createSession (/var/task/node_modules/client/connections/ConnectionToCore.ts:205:51)\n at SessionConnection.getCoreSessionOrReject (/var/task/node_modules/client/lib/Agent.ts:559:36)\n at SessionConnection.get activeTab [as activeTab] (/var/task/node_modules/client/lib/Agent.ts:480:10)\n at Agent.get activeTab [as activeTab] (/var/task/node_modules/client/lib/Agent.ts:152:37)\n at Agent.goto (/var/task/node_modules/client/lib/Agent.ts:365:17)\n at Runtime.exports.handler (/var/task/app.js:41:42)\n at Runtime.handleOnce (/var/runtime/Runtime.js:66:25)"}
There seems to be something wrong with the communication with the core, could it be some mismatch on chrome dev tools/debug protocols or whatever secret agent is using to communicate with chrome? Mind you I haven't got yet full understanding on how secret agent operates in this regard.
Thanks for running these experiments! It's been awhile since I've used Lambda. You used to have to pre-compile native modules and package them. Is that still true?
You're correct - the .validated folder is whether we've checked linux dependencies. On debian based Operating systems, we have a .deb file that will install any missing deps so that Chrome has everything it needs. I'm not sure if the binaries you're finding are coming with everything or not.
This is an example of one of the projects I'd seen in the past that precompiled some binaries: https://github.com/alixaxel/chrome-aws-lambda/tree/master/bin
Regarding Core->Client, SecretAgent is forking a process and then (if you are using the default full-client project), it communicates over a socket to that process. I'm not sure what restrictions Lambda puts on launching processes. I'm guessing there are probably some other logs that are happening that show something else breaking prior to this?
Well for deploying to aws I'm using docker images which you can build taking base on other images, there's one with node14 provided by amazon, so it's just a matter of installing any node modules with npm install and the required dependencies of those modules (in this case not only the modules but the chrome binary too) on the base system ex: gtk, alsa, etc....
Using the binary from chrome-aws-lambda I think I have it working but with a lot of kludges, still need to do more testing code/cleanup.
Hi, although I got it to run under aws-lambda I couldn't get clodflare checks to pass, and cloudflare support is the main feature I need to get working for my specific use case.
Here's a detalied description on how I got it to run:
I'm using aws-lambda docker images built with this dockerfile:
FROM public.ecr.aws/lambda/nodejs:14
###### Alternatively, you can pull the base image from Docker Hub: amazon/aws-lambda-nodejs:14
###### Assumes your function is named "app.js", and there is a package.json file in the app directory
COPY app.js package.json /var/task/
###### Enviroment variables persistent on the built image and needed during install
ENV XDG_CACHE_HOME=/var/task/
###### install Chrome dependencies for default binary
###### only aws-chrome-binary
###### RUN yum install tar gzip -y -q
###### both
RUN yum install libXrandr gtk3 alsa-lib tar gzip -y -q
###### Enviroment variables used during install
ARG SA_REPLAY_SKIP_BINARY_DOWNLOAD=true
ARG SA_REBUILD_MITM_SOCKET=true
ARG GO_URL="https://golang.org/dl/go1.17.6.linux-amd64.tar.gz"
###### Install Go for Mitm sockets
ARG OLD_PATH=$PATH
RUN set -eux; \
curl -# -o go.tgz -L "https://golang.org/dl/go1.17.6.linux-amd64.tar.gz"; \
tar -C /usr/local -xzf go.tgz; \
rm go.tgz; \
export PATH="/usr/local/go/bin:$PATH"; \
go version
ENV GOPATH /go
ENV PATH $GOPATH/bin:/usr/local/go/bin:$PATH
###### Install NPM dependencies for function
RUN npm install
###### Restore old PATH and remove go
ENV PATH $OLD_PATH
RUN unset GOPATH
RUN rm -rf /usr/local/go
###### needed for custom chrome launch args
COPY configureBrowserLaunchArgs.js /var/task/node_modules/@secret-agent/default-browser-emulator/lib/helpers/
###### No need to show replay as secret-agent runs headless
ENV SA_SHOW_REPLAY=0
###### Set the CMD to your handler (could also be done as a parameter override outside of the Dockerfile)
CMD [ "app.handler" ]
I had to resort to hardcoded launch args replacing the whole configureBrowserLaunchArgs.js file as I couldn't get the launchargs plugin to work, the plugin worked just fine on my local machine inside the docker container and out of it, it didn't worked in the real lambda enviroment, it was giving some errors related to the connection to Core.
The flags I add are:
'--force-webrtc-ip-handling-policy=default_public_interface_only', '--no-startup-window',
// Custom args
'--homedir=/tmp',
'--user-data-dir=/tmp/chrome-user-data',
'--disable-gpu',
'--no-sandbox',
'--in-process-gpu',
'--single-process'
if (options.showBrowser) {
Package.json:
{
"name": "app",
"version": "1.0.0",
"description": "secret-agent-test",
"main": "app.js",
"scripts": {
"test": "test"
},
"author": "none",
"license": "PRIVATE",
"private": true,
"dependencies": {
"secret-agent": "^1.6.4",
"aws-sdk": "^2.1063.0",
"chrome-aws-lambda": "^10.1.0"
}
}
app.js:
const process = require('process');
const Agent = require('secret-agent');
const chromium = require('chrome-aws-lambda');
exports.handler = async function(event, context) {
const url = 'https://nowsecure.nl';
const baseWaitMS = 5000;
const timeoutMS = 10*baseWaitMS;
var cloudflareDetected = false;
var cloudflareBypasseed = false;
var cloudflareRetries = 10;
//process.env.DEBUG = true;
//Uncomment to use alternate binary
process.env.CHROME_88_BIN = await chromium.executablePath;
try{
const agent = new Agent.Agent({ upstreamProxyUrl: 'http://user:pass@proxy:port' });
agent.activeTab.on('resource', async (event) => {
var cfHeaders = await event.response.headers;
var responseUrl = await event.response.url;
if ( responseUrl.replace(/\/$/, "") === url.replace(/\/$/, "") && 'cf-ray' in cfHeaders) {
if(!cloudflareDetected){
cloudflareDetected = true;
console.log("Cloudflare detected, ray-id:" + cfHeaders['cf-ray']);
}
var repsonseCode = await event.response.statusCode;
console.log("Response code: " + repsonseCode + ' ray-id: ' + cfHeaders['cf-ray']);
if (repsonseCode >=200 && repsonseCode <=299){
cloudflareBypasseed = true
console.log("Cloudflare bypassed, ray-id:" + cfHeaders['cf-ray']);
cloudflareRetries = 0;
}
if (repsonseCode == 403){
console.log("Cloudflare forbidden response");
cloudflareRetries = 0;
}
if(cloudflareRetries > 0){
//await agent.activeTab.waitForMillis(baseWaitMS-minWaitMS);
cloudflareRetries--;
}
}
});
var responseObject = await agent.goto(url);
await agent.activeTab.waitForResource({url: url}, {timeoutMs: timeoutMS})
await agent.activeTab.waitForMillis(baseWaitMS);
var responseStatusCode = await responseObject.response.statusCode;
if(!cloudflareDetected){
if (responseStatusCode >=400){
console.log('SCRAPE ERROR: HTTP ERROR CODE: ' + responseStatusCode + ' ' + url + ': ' + pageSource)
result = 1;
}
else {
var pageSource = await agent.activeTab.document.documentElement.outerHTML;
var screenShot = await agent.activeTab.takeScreenshot({'format':'png'});
console.log('SCRAPE: ' + url + ': Scraped Success.')
result = 0;
}
}
else{
while(!cloudflareBypasseed && cloudflareRetries>0){
await agent.activeTab.waitForMillis(baseWaitMS);
}
if (cloudflareBypasseed){
var pageSource = await agent.activeTab.document.documentElement.outerHTML;
var screenShot = await agent.activeTab.takeScreenshot({'format':'png'});
console.log('SCRAPE: ' + url + ': Scraped Success.')
result = 0;
}
else{
throw "Cloudflare bypass error";
}
}
} catch(error){
console.log('SCRAPE ERROR: ' + url + ':' + error);
result = 2;
}
console.log(result);
return result;
}
I guess the differences between the binary provided by chrome-aws-lambda and the default binary in secret agent are causing it to fail in passing the cloudflare checks. I tried with other versions of the chrome-aws-lambda package (one with chrome version 88.xx) and found no differences in the outcome.
Will a different build of the secret agent binary be needed to support aws-lambda and pass cloudflare checks?
Is there something else I could try to get it running?
Here are some logs:
With default binary, the lambda fails with the aforementioned PipeTransport error:
START RequestId: f21741ac-cfa0-400e-aba7-7edf7e501e4e Version: $LATEST
2022-02-07T08:54:18.489Z ERROR [/var/task/node_modules/@secret-agent/puppet/lib/PipeTransport] PipeTransport.WriteError { context: {}, sessionId: null, sessionName: undefined } Error: read ECONNRESET
at Pipe.onStreamRead (internal/stream_base_commons.js:209:20) {
errno: -104,
code: 'ECONNRESET',
syscall: 'read'
}
2022-02-07T08:54:18.493Z ERROR [/var/task/node_modules/@secret-agent/puppet/lib/PipeTransport] PipeTransport.WriteError { context: {}, sessionId: null, sessionName: undefined } Error: read ECONNRESET
at Pipe.onStreamRead (internal/stream_base_commons.js:209:20) {
errno: -104,
code: 'ECONNRESET',
syscall: 'read'
}
2022-02-07T08:54:18.507Z ERROR [/var/task/node_modules/@secret-agent/core/index] UnhandledError(fatal) { context: {}, sessionId: null, sessionName: undefined } TypeError: Cannot read property 'isClosed' of undefined
at onClosed (/var/task/node_modules/puppet-chrome/lib/Connection.ts:84:14)
at PipeTransport.onReadClosed (/var/task/node_modules/puppet/lib/PipeTransport.ts:70:42)
at Socket.emit (events.js:412:35)
at Pipe.<anonymous> (net.js:686:12)
/var/task/node_modules/@secret-agent/puppet-chrome/lib/Connection.js:75
if (this.isClosed)
^
TypeError: Cannot read property 'isClosed' of undefined
at onClosed (/var/task/node_modules/puppet-chrome/lib/Connection.ts:84:14)
at PipeTransport.onReadClosed (/var/task/node_modules/puppet/lib/PipeTransport.ts:70:42)
at Socket.emit (events.js:412:35)
at Pipe.<anonymous> (net.js:686:12)
2022-02-07T08:54:18.529Z f21741ac-cfa0-400e-aba7-7edf7e501e4e INFO SCRAPE ERROR: https://nowsecure.nl:TypeError: Cannot read property 'goto' of null
] at Runtime.exports.handler (/var/task/app.js:69:30)ts:160:36)e INFO [
2022-02-07T08:54:18.547Z f21741ac-cfa0-400e-aba7-7edf7e501e4e ERROR Unhandled Promise Rejection {"errorType":"Runtime.UnhandledPromiseRejection","errorMessage":"TypeError: Cannot read property 'addEventListener' of null","reason":{"errorType":"TypeError","errorMessage":"Cannot read property 'addEventListener' of null","stack":["TypeError: Cannot read property 'addEventListener' of null"," at Tab.addEventListener (/var/task/node_modules/client/lib/AwaitedEventTarget.ts:16:27)"]},"promise":{},"stack":["Runtime.UnhandledPromiseRejection: TypeError: Cannot read property 'addEventListener' of null"," at process.<anonymous> (/var/runtime/index.js:35:15)"," at process.emit (events.js:400:28)"," at process.emit (domain.js:475:12)"," at processPromiseRejections (internal/process/promises.js:245:33)"," at processTicksAndRejections (internal/process/task_queues.js:96:32)"]}
[ERROR] [1644224058549] LAMBDA_RUNTIME Failed to post handler success response. Http response code: 403.
END RequestId: f21741ac-cfa0-400e-aba7-7edf7e501e4e
REPORT RequestId: f21741ac-cfa0-400e-aba7-7edf7e501e4e Duration: 13608.83 ms Billed Duration: 21865 ms Memory Size: 1024 MB Max Memory Used: 241 MB Init Duration: 8255.26 ms
RequestId: f21741ac-cfa0-400e-aba7-7edf7e501e4e Error: Runtime exited with error: exit status 128
Runtime.ExitError
With chrome-aws-lambda binary, the lambda runs fine but cloudflare checks are not passed and ends up in a timeout error, on local (inside container or outside container) it usually passes the test on the first 5 retries:
START RequestId: 20bcf4ee-7f6b-4737-92af-20829c326e92 Version: $LATEST
2022-02-07T09:24:51.769Z 20bcf4ee-7f6b-4737-92af-20829c326e92 INFO Cloudflare detected, ray-id:6d9b914ea97d83e1-BRU
2022-02-07T09:24:51.771Z 20bcf4ee-7f6b-4737-92af-20829c326e92 INFO Response code: 503 ray-id: 6d9b914ea97d83e1-BRU
2022-02-07T09:24:56.911Z 20bcf4ee-7f6b-4737-92af-20829c326e92 INFO Response code: 503 ray-id: 6d9b916efbaf83e1-BRU
2022-02-07T09:25:04.491Z 20bcf4ee-7f6b-4737-92af-20829c326e92 INFO Response code: 503 ray-id: 6d9b919e485d83e1-BRU
2022-02-07T09:25:16.194Z 20bcf4ee-7f6b-4737-92af-20829c326e92 INFO Response code: 503 ray-id: 6d9b91e76f5683e1-BRU
2022-02-07T09:25:35.612Z 20bcf4ee-7f6b-4737-92af-20829c326e92 INFO Response code: 503 ray-id: 6d9b9260ed3383e1-BRU
2022-02-07T09:26:11.130Z 20bcf4ee-7f6b-4737-92af-20829c326e92 INFO Response code: 503 ray-id: 6d9b933eadc483e1-BRU
2022-02-07T09:26:46.612Z 20bcf4ee-7f6b-4737-92af-20829c326e92 INFO Response code: 503 ray-id: 6d9b941c8a9983e1-BRU
2022-02-07T09:27:06.672Z 20bcf4ee-7f6b-4737-92af-20829c326e92 INFO Response code: 503 ray-id: 6d9b949a0e8a83e1-BRU
2022-02-07T09:27:24.890Z 20bcf4ee-7f6b-4737-92af-20829c326e92 INFO Response code: 503 ray-id: 6d9b950be9dd83e1-BRU
2022-02-07T09:27:42.813Z 20bcf4ee-7f6b-4737-92af-20829c326e92 INFO Response code: 503 ray-id: 6d9b957bef7183e1-BRU
END RequestId: 20bcf4ee-7f6b-4737-92af-20829c326e92
REPORT RequestId: 20bcf4ee-7f6b-4737-92af-20829c326e92 Duration: 180072.95 ms Billed Duration: 187658 ms Memory Size: 1024 MB Max Memory Used: 574 MB Init Duration: 7584.09 ms
2022-02-07T09:27:43.309Z 20bcf4ee-7f6b-4737-92af-20829c326e92 Task timed out after 180.07 seconds