Project Instructions

  • Create a scraper.js file that will contain your command line application. Your project should also include a package.json file that includes your project’s dependencies. The npm install command should install your dependencies.

  • Program your scraper to check for a folder called ‘data’. If the folder doesn’t exist, the scraper should create one. If the folder does exist, the scraper should do nothing.

Choose and use third-party npm packages.

  • For scraping content from the site, either use a scraping module or use the Cheerio module to create your own scraper.
  • To create the CSV file, use a CSV creation module.
  • Be sure to research the best package to use (see the project resources for a link to the video about how to choose a good npm package). Both packages should meet the following requirements:
    • At least 1,000 downloads
    • Has been updated in the last six months

Program your scraper


Scraping and Saving Data:

  • The scraper should get the price, title, url and image url from the product page and save this information into a CSV file.
  • The information should be stored in an CSV file that is named for the date it was created, e.g. 2016-11-21.csv.
  • Assume that the the column headers in the CSV need to be in a certain order to be correctly entered into a database. They should be in this order: Title, Price, ImageURL, URL, and Time
  • The CSV file should be saved inside the ‘data’ folder.

  • If your program is run twice, it should overwrite the data in the CSV file with the updated information.

  • If http://shirts4mike.com is down, an error message describing the issue should appear in the console.

  • The error should be human-friendly, such as “There’s been a 404 error. Cannot connect to http://shirts4mike.com.”

  • To test and make sure the error message displays as expected, you can disable the wifi on your computer or device.


Extra Credit


  • Edit your package.json file so that your program runs when the npm start command is run.

  • When an error occurs, log it to a file named scraper-error.log . It should append to the bottom of the file with a time stamp and error e.g. [Tue Feb 16 2016 10:02:12 GMT-0800 (PST)] <error message>