kerbaras/scraper

TypeScript

scraper

TODO: description

Running the scraper as a script

Use the file

Run yarn install in the scraper repository root
Edit run/urls.yml to specify urls to scrape for each provider. Naming a provider starting with a . will cause all its urls to be ignored
Run yarn scrape to start scraping!
Results are gonna be output in ./run/{provider}-{date}-batch{number}.json

Alternative you can run using the launch option in VSCode (And it will attach the debuger!)

Running in headless mode

In order to run the stack as headless you'll need to set up a .env like the following:

# .env
HEADLESS=true

Create an scraper

Run yarn generate {name}
Code the scraper in the generated file at src/providers/{name}/scraper.ts
Register your scraper by adding export * as {name} from './{name}' at src/providers/index.ts
Run your scraper! :)

Environment Variables

The Scraper uses the following environment variables:

HEADLESS: Wether to launch chromeium in headless mode or headful (with GUI). false by default