Kumamon
硅谷第7小队项目repo
Web Crawler via Scrapy
1st Project:Pacing
[2016/02/08 - 2016/02/14]
First Stage: Create a Scrapy project to crawl the content in the Xiaomi Appstore homepage or any other Appstore homepage
[2016/02/15 - 2016/02/21]
Second Stage: Save the crawled content in MongoDB[2]. Install Python MongoDB driver and modify pipelines.py to insert crawled data into MongoDB.
[2016/02/22 - 2016/02/29]
Third Stage: Crawl more content by following next page links. So far you have likely only crawled the content of the home page. We need to use Splash[3] and ScrapyJS[4] to re-render the web page to transform the dynamic part to static content if the next page link is written in JavaScript
Bonus Round
- pull results from mongo db and show it in browser via flask
- multiprocessing (tbd)
What is next?
- 1st project - Crawler (python)
- 2nd project - Recommender (python / spark)
- 3rd project - website (Meteor/React)
Learn programing via project
Nowadays we spend a lot of time to have a good grap of the Data Structure and Algorithms by solving the problems on CC, LC and GFG. But we still probably cannot end up with a good result in our job seeking, since the CS job market is so hot that you have so many competitors...
Quality beats quantity. Instead of going through a lot of questions, if you can make best use of your knowledge to build a product, you can easily extend to similar problems after some practice(this is what they look for, Your problem solving abilities).