BEFORE YOU USE
This repository is a simple NLP project for beginners and will be updated occasionally.
Its mainly based on python3.6, OS X.
Please follow the restrictions in Apache-2.0 Licence before you use this repository.
All rights reserved by Ming Jin.
Introduction
这是一个微博评论分析工具,实现功能主要有:
- 微博评论数据爬取;
- 分词与关键词提取;
- 词云与词频统计;
- 情感分析;
- 主题聚类
实现的效果在目录下: “ 案例:泰国大象踩踏伤人事件 ”
This is a Weibo comments processing toolbox, which has been implemented for:
- Weibo comments crawler that based on regular expression
- Tokenization, filtration and key words extraction
- Words cloud and visualization
- Sentiment analysis
- Topic clustering that based on LDA
Code also works on Twitter and I may update a new repository about it.
Pre-Requirements Checklist
MySQL is required(Highly recommend MySQL Workbench)
- importlib;
- sys;
- time;
- requests;
- lxml
- pymysql;
- jieba;
- PIL
- numpy & matplotlib;
- wordcloud;
- snownlp;
- logging;
- configparser;
- random;
- codecs;
- os;