Data Analysis of Lagou Job
This repository holds the code for job data analysis of Lagou. The main functions included are as follows:
- Crawling job data from Lagou, and get the latest information of jobs about Internet.
- Data analysis and visualization.
- Crawling job details info and generate word cloud as Job Impression.
- In order to train a NLP task with machine learning, the data of interviewee's comments will be stored in mongodb
-
Install 3rd party libraries
sudo pip3 install -r requirements.txt
-
Install mongodb and start mongodb service
sudo service mongod start
- clone this project from github.
- run m_lagou_spider.py to crawl job data, it will output an Excel file.
- run hot_words.py to cut sentences, and return TOP-30 hot words.