/meme-classification-web-application-using-machine-learning

This repository contains a web application associated with a collection of a few classification algorithms using machine learning in Python to determine the sentiments behind internet memes based on image and text data extracted from 6,992 different internet memes, as part of the final project for the Introduction to Data Science (DS2001) course.

Primary LanguageJupyter NotebookBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Meme Classification Web Application Using Machine Learning:

This repository contains a web application associated with a collection of a few classification algorithms using machine learning in Python to determine the sentiments behind internet memes based on image and text data extracted from 6,992 different internet memes, as part of the final project for the Introduction to Data Science (DS2001) course.

Dependencies:

Introduction:

Classification is defined as the process of recognition, understanding, and grouping of objects and ideas into preset categories (classes). With the help of these pre-categorised training datasets, classification in machine learning programs leverage a wide range of algorithms to classify future datasets into respective and relevant categories (classes). Classification algorithms used in machine learning utilise input training data for the purpose of predicting the likelihood or probability that the data that follows will fall into one of the pre-determined categories.

One of the most common applications of classification algorithms is image and text classification to determine which pre-determined categories certain image and/or text data is the most relevant to. While classification algorithms work for a variety of image and text data, I've trained certain image and text classification models specifically for the classification of internet memes to determine whether a certain meme relays one of five pre-categorised sentiments; neutral, positive, negative, very positive, and very negative. The training dataset used for the image and text classification models consists of image data from 6,992 different internet memes along with their respective sentiments based on the text data extracted from each of them.

Classifiers Used (scikit-learn):

  • sklearn.ensemble.RandomForestClassifier (read)
  • sklearn.neighbors.KNeighborsClassifier (read)
  • sklearn.ensemble.ExtraTreesClassifier (read)
  • sklearn.linear_model.SGDClassifier (read)
  • sklearn.naive_bayes.MultinomialNB (read)
  • sklearn.linear_model.LogisticRegression (read)

Usage:

  • Meme Classification.ipynb — Contains the implementations (scikit-learn) of all trained and tested image and text classification models.
  • app.py — Source code for the web application (Flask) associated with the classification algorithms using machine learning.
  • test_images — Contains the images used for testing the trained image and text classification models.
  • templates — Contains the source codes for the web pages (home.html and predict.html) rendered by the web application (Flask).
  • static\files — Directory used by the web application (Flask) to store the uploaded images into.

Instructions (Execution):

Firstly, download the training dataset containing the internet memes to be trained by the classification algorithms using machine learning and extract it into the same directory as the source code files. After that, run all the cells in Meme Classification.ipynb, which will eventually generate the corresponding pickle (.pkl) files for each of the trained image and text classification models. Lastly, run app.py and open the link to the host port. Upload the internet meme (any valid image format) to be tested and its determined sentiment will be displayed accordingly.

Note:

The source codes for all the source code files were written entirely for Microsoft Windows and may require certain changes to be run correctly on other operating systems.


References: