TODO: description
Use the file
- Run
yarn install
in the scraper repository root - Edit
run/urls.yml
to specify urls to scrape for each provider. Naming a provider starting with a.
will cause all its urls to be ignored - Run
yarn scrape
to start scraping! - Results are gonna be output in
./run/{provider}-{date}-batch{number}.json
Alternative you can run using the launch option in VSCode
(And it will attach the debuger!)
In order to run the stack as headless you'll need to set up a .env like the following:
# .env
HEADLESS=true
- Run
yarn generate {name}
- Code the scraper in the generated file at
src/providers/{name}/scraper.ts
- Register your scraper by adding
export * as {name} from './{name}'
atsrc/providers/index.ts
- Run your scraper! :)
The Scraper uses the following environment variables:
HEADLESS
: Wether to launch chromeium in headless mode or headful (with GUI).false
by default