/Discrete-ML-Methods

The aim of this project is to assess various machine learning classification techniques for discrete data. The models are to predict tweets' location solely using lexical analysis.

Primary LanguageRich Text Format

Classification Using Various Machine Learning Methods to Predict Tweet Locations

Aim

The aim of this project is to assess various machine learning classification techniques for discrete data. The models are to predict tweets' locations across Australia using lexical analysis.

Summary

Naïve Bayes, Decision Trees, Random Forests and Ensemble Learning Methods are analysed for its performance in predicting locations of tweets in Australia. Data and feature selection is performed and pre-processing is done to increase the classifiers' accuracy. Various academic literatures have confirmed the credibility of the machine learning algorithms implemented within this report and additional context would be required to increase the accuracy of prediction.

Guide

  • 'Code' - Contains all the necessary code to perform this project.
  • 'Submissions' - Contains results of all versions of data and feature selections performed.
  • '2019S1-proj2-data' - Contains raw data and top 10, 50 and 100 used words for training, development and testing.
  • 'preprocessed' - Contains preprocessed data after data and feature selection.

Built With

  • Python 3

Special Thanks

  • Dr. Jeremy Nicholson, Dr. Afshin Rahimi and Dr. Tim Baldwin
  • The University of Melbourne