This is a implemetation of GBDT multi-classification with sklearn,the dataset is 20news-19997.tar.gz
- python 3.6
- nltk
- scikit-learn
- numpy
- pickle
need download and unzip 20news-19997.tar.gz to root dir(eg. './20_newsgroups'), and download crawl-300d-2M.vec, put it in './vectors/'.
python data_process.py
you will get 2 dirs, each with 4 pickle file:
- ./data_tfidf: word used TD-IDF
- ./data: word embedding using fasttext, pretrained file: crawl-300d-2M.vec
python main.py