scrape-helpers

Some helper-functions to scrape websites

mkdir packages
git submodule add git@github.com:signalwerk/scrape-helpers.git "./packages/scrape-helpers"
cp packages/scrape-helpers/example/get.js .
cp packages/scrape-helpers/example/package.json .
mkdir DATA
npm i
echo "/node_modules" >> .gitignore

Todo

  • Detect if files are already downloaded and skip them (images with and without params)
  • get wiki-texts