/template-scraper

Its a small python program that scrapes a website and its needed pages to obtain all schools(primary and secondary) in Tanzania. Warning: Scrape according to website terms of service, dont just overload any server.

Primary LanguagePython

template-scraper

Requisites

We need the following to run this script successfully:

  1. I'm using python 3.5.2, so i don't know about other versions, but i think it would work on any python 3.x version(www.python.org)
  2. Also used 'json' library which comes by default with most python versions
  3. Either the 'requests' module(pip insall requests) or the 'urllib2' module(pip install yieldfrom.urllib.requests)
  4. The time function which comes with python by default, we use this to time our program so it doesn't overload the server with our requests
  5. BeautifulSoup4, i use this for the scraping bussiness but you can use any scraper you're used to, (pip install bs4) or (pip install beautifulsoup4)
    You need a good internet connection to make it run fast.. This version is an alternative, (using json instead of mongodb,) to my school-template-scraper(http://www.github.com/anorebel/school-template-scraper)
    Also, the general.py is a combination of both the primary.py and secondary.py, you can run the two or you can just run general.py.. Incase of any problem, contact me.