Web Scraper for list of companies from OfferZen

Using..

Framework
- ~~ Scrapy - Framework for scraping a website ~~
- ~~pip install --user Scrapy~~
Libraries
- BeautifulSoup
- Requests
- click - create commandline interface for package
- Yaml - store config info in yaml file
- time - sleep
- random - shuffle list of links
Database
- Mongod - using mongo for data storage.

Virtualenv - Using Virtualenvwrapper.
- Explicitly using python36
- makevirtualenv -a . -p python36 jobscraper
Starting out with Requests, BeautifulSoup4 and saving in MongoDb.
Install package - pip install beautifulsoup4 requests pymongo lxml

UTF-8 errors when trying to save result of BeautifulSoup page.
- Save the request.text response instead. No need to tranform with BeautifulSoup yet.

~~Download main page with links to all the company pages~~
1. ~~Save page as request.text in mongodb~~
2. ~~Parse the main page to get links for all the companies~~
~~Using links from main page, download individual company pages.~~
~~Save pages in mongodb~~
Process company details
1. From main page
  1. City option list
  2. Technology option list
  3. Individual company info:
    - Elevator pitch
    - Location
    - Company Size
    - Technologies
    - City category - data-cities
    - Technology stack - data-tech-services
    - Company Id - data-id
2. Retrieve information
  1. Company name
  2. Company url
  3. Company stack
  4. Company address/location