(COMS 4771 Machine Learning @Columbia)
Implemented and trained a perceptron-based classifier with a million online restaurant reviews and the corresponding ratings from scratch. Parsed the review texts to vectors via three representations (unigram/tf-idf/bigram). Achieved 90% test accuracy for all three models trained with different representations. Extracted the 10 words that have the most positive weights and the 10 words that have the most negative weights for rating prediction.
Key words: online perceptron algorithm, online-to-batch conversion, unigram, tf-idf, bigram