This project is intended to be used to crawl webpages such as linkedin, google, and other information rich resources to aggregate information pertaining to potential new investment clients.
The project will be built on :
- Scrapy for web crawlers (i.e. Spiders) and a data pipeline
- Silenium for reading JS dynamtically loaded HTML pages
- Docker for running scaled test and deployment environments
- MySQL to a database to store all the information
The main language will be Python