Clone repo, run yarn or npm install.
Run this command, to see available options:
npx ts-node src/cli.ts
npx ts-node src/cli.ts init [-c configuration.yaml]
npx ts-node src/cli.ts run [-c configuration.yaml] [-o output.yaml]
Compile the TypeScript code by invoking tsc
in the root directory.
Now run npm link
to globally install the urlscanner.
Now you can run anywhere
urlscanner init [-c configuration.yaml]
urlscanner run [-c configuration.yaml] [-o output.yaml]
Top level config crawler
allows setting request options
crawler:
# see https://github.com/yujiosaka/headless-chrome-crawler/blob/master/docs/API.md#crawlerqueueoptions
options:
userAgent: Custom Crawler Engine