Classifying articles based on body of text using Weka jar on Java
-
ExtractFiles.java iterates through all SGM files(each with more than one article) in a directory and divides it into files that contain only the heading and body of text (i.e. tags etc are removed) of ONE article alone
-
CreateDataset.java is used to create the arff dataset for weka from a set of text files
-
Classifiers.java performs the classification on the arff dataset using tfidf algorithm