A PyTorch CNN for classifying the sentiment of movie reviews, based on the paper Convolutional Neural Networks for Sentence Classification by Yoon Kim (2014).
The task of text classification has typically been done with an RNN, which accepts a sequence of words as input and has a hidden state that is dependent on that sequence and acts as a kind of memory. This example shows how you can utilize convolutional layers to find patterns in sequences of word embeddings and create an effective text classifier using a CNN-based approach!
Image from the original paper, Convolutional Neural Networks for Sentence Classification.
If you'd like to work with this code locally, you may follow the instructions (as needed) below! These installation instructions assume you have installed miniconda, but if you have not, you can download the latest version here.
For Windows users, these following commands need to be executed from the Anaconda prompt as opposed to a Windows terminal window. For Mac, a normal terminal window will work.
These instructions also assume you have git
installed for working with Github from a terminal window, but if you do not, you can download that first with the command:
conda install git
Now, we're ready to create a local environment!
- Clone the repository, and navigate to the downloaded folder. This may take a minute or two to clone due to the included image data.
git clone https://github.com/cezannec/CNN_Text_Classification.git
cd CNN_Text_Classification
-
Create (and activate) a new environment, named
classification-env
with Python 3. If prompted to proceed with the install(Proceed [y]/n)
type y.- Linux or Mac:
conda create -n classification-env python=3 source activate classification-env
- Windows:
conda create --name classification-env python=3 activate classification-env
At this point your command line should look something like:
(classification-env) <User>:CNN_Text_Classification <user>$
. The(classification-env)
indicates that your environment has been activated, and you can proceed with further package installations. -
Install PyTorch and torchvision; this should install the latest version of PyTorch.
- Linux or Mac:
conda install pytorch torchvision -c pytorch
- Windows:
conda install pytorch -c pytorch pip install torchvision
-
Install a few required pip packages, which are specified in the requirements text file (including gensim).
pip install -r requirements.txt
- That's it!
Now all of the classification-env
libraries are available to you. Assuming your classification-env
environment is still activated, you can navigate to the main repo and start looking at the notebooks:
cd
cd CNN_Text_Classification
jupyter notebook
To exit the environment when you have completed your work session, simply close the terminal window.