/nlp-classification

Final project for Artifical Intelligence - COMP4106

Primary LanguageJava

nlp-classification

Written by Savanna Endicott in April 2017

Description:

This project was used as my final project for COMP4106 - Artificial Intelligence at Carleton University. It's purpose is to compare Naive bayes and Bag of Words classification techniques using a corpus of language from news articles from the following categories: Sports, Entertainment, World News, and Travel. All of these articles are from CBC.ca.

Usage:

  1. Download from https://github.com/savannaendicott/nlp-classification
  2. Import into IDE & package project with SQLite
  3. Run (there is only one main – in naivebayes.java)
  4. Follow the menu options      * Select classification method (Naïve Bayes or Bag of Words)      * Select testing set for the classification A file from the corpus, a set of words input by the user on the go, or a sample test document I’ve uploaded to the project.
    • Select files if appropriate (will be prompted by menu)
    • Classification results are returned to the screen as is *IMPORTANT: LOG OF ALL STEPS OF THE PROCEDURE ARE FORMATTED AND PRINTED TO A LOG FILE FOR EACH CLASSIFICATION *
    • The log file’s name will be printed in the menu along with the result Format “/src/docs//<date/time>.txt”
  5. At the end of the program, type anything and hit enter to restart.

Important Files:

  • CorpusDB.java
  • Corpus.java
  • naivebayes.java
  • bagOfWords.java

Note: The text files in this project are important - used to create the corpus of information. These cannot be removed without consequences.

References: