Scrapes a list of urls, of similarly structured website, from a txt file and outputs them into a json file. This can be modified for different pages.
- NodeJS
npm install
Edit urls.txt to match the sites you will be scraping.
Simply change the tag/class/id to match where the content is located on the page so that the scraper doesn't get unneeded information.
$('[TAG/CLASS/ID]').filter(function() { ... });
If your webpages have more content to be organized add more.
In scrapePage()
:
- Add to the
json
variable - Add filter to get content. (Just copy and replace as needed)
node index.js
The output is written to output.json
- urls.txt
- You can comment out lines with a #.