TomatoEngine

An application to crawl a text corpus of Rotten Tomatoes movie reviews, act as a search engine to query over the corpus and perform text classification and clustering.

This repo is structured into four main folders:

TomatoCrawler
TomatoClassifier
TomatoSearch
OkTomato

TomatoCrawler

It is a crawling module implemented in Node.js.

To install the dependency,

$ npm install

To run the crawling,

$ node TomatoCrawler/main.js

TomatoClassifier

First, we need to install the following dependencies manually because the installation process is not consistent across platform:

Install Mathplotlib
Install Scipy
Install Numpy
Install Scikit-learn

To run the classifier,

$ python3 main.py

It will try different classifiers and show precision. We tweaks parameters in main.py for different classifier.

To label all the data using the classifier,

$ python3 label_data.py

TomatoSearch

There are two folders config and website which are contains the code for indexing and the website respectively. The instructions can be found as follows:

OkTomato

This folder is mainly used to download the entities from Elasticsearch and upload them to Wit.ai.

In the OkTomato directory:

To download the entities, run

$ python data/populate_data.py

To upload to Wit.ai, run

$ python upload_entities.py

Salihan04/TomatoEngine

TomatoEngine

TomatoCrawler

TomatoClassifier

TomatoSearch

OkTomato