-
use two methods svm and naive_bayes with tfidf to classify article
-
article may have multi-topic, so apply two approach 2-1. split multi-topic to different single topic instance, for example: [topic 1, topic 2] article 1 => topic 1 article 1 topic 2 article 1
2-2. delete multi-topic The result is that delete multi-topic has better performance, no matter use svm or naive_bayes
-
write predict topic of each article to excel(./data/svm_single_label_result.xls, ./data/nb_single_label_result.xls)
-
Automatically classifying all test documents in ModApte, and showing evaluation results for classification effectiveness (precision, recall, F-measure, accuracy), show in screen shot: svm.png, naive_bayes.png
solution of multi-topic
please set enough resource to your vm(e.g. memory, cpu..)
below install steps is install in virtual machine install python, sklearn
- execute generate_dataset.py to generate ModApte dataset from Reuters21578
- execute svm_tfidf.py to get classification result (./data/svm_single_label_result.xls)and evaluation score precision, recall, F-measure, accuracy
- execute naive_bayes_tfidf.py to get classification result(./data/nb_single_label_result.xls) and evaluation score precision, recall, F-measure, accuracy
109598091 陳逸芳