/ndr2-track-scraper

Simple demonstration code to show how to scrape data with node.js

Primary LanguageJavaScript

ndr2-track-scraper

This is just a quick demo to show how to scrape stuff from the web using small components instead of lots of spaghetti-ish code. It consists of these components:

  • UrlGenerator: Generates urls according to a given schema
  • PageDownloader: Downloads pages and calls a callback for each transferred page
  • TracksParser: Parses a html document for track information and calls a callback for each found track with extracted data
  • TracksService: Saves data to a database

Samples

The sample directory contains two basic map reduce functions to see the most popular artists and titles.

Known issues

Right now the application does not exit after finishing work. This is because of mongoose having an open connection to mongo db.