%{ Author : Vaibhav Malpani Research Paper Classification using Abstract Text %}
To get started, please read the Quick Start'' section first. For running the code samples, please check the
Usage Examples'' section.
- Quick Start
- Installation
- Usage Examples
- Additional Information
See the section ``Installation'' for setting up the required third party libraries.
Dataset already pickled.
Code can be accessed from CourseWorks.
Once you have both code and data in place, create a new folder (say 'codata') and keep all the files from code and data in one single folder.
We worked with the following configuration to generate results in the report: (MAC OS X 10.9)
- MATLAB_R2013a
- Python 2.7.6
- liblinear-1.94
- libsvm-3.17
- nltk-2.0.4
- sklearn-0.15
Kindly install the following packages before running any script in ``Usage Examples''. After each step, verify if the installation was successful using the instructions given in the respective readMe.txt
- LibSVM (MATLAB and Python Interface)
Installation Link: http://www.csie.ntu.edu.tw/~cjlin/libsvm/#download
- LibLinear (MATLAB and Python Interface)
Installation Link: http://www.csie.ntu.edu.tw/~cjlin/liblinear/#download
- NLTK (Python)
Installation Link: http://nltk.org/install.html
- Scikit-Learn (Python) 4.1 numpy, scipy, matplotlib (generally installed as a part of scikit)
Installation Link: http://scikit-learn.org/stable/install.html
Enter path to python/MATLAB interface of liblinear when prompted else it would pick up our system path giving errors.
(i) PYTHON
python classifyCora.py
Generates classification report for the following algorithms:
- Rocchio
- kNN
- Stochastic Gradient Descent
- Linear SVC
- LibLinear SVM
- Perceptron
python parseData.py
Generates 'svmForm.txt' from cora.vectors. 'svmForm.txt' file is in libsvm format and is used for all our analysis. It is supplied as a part of data.tar.gz. In case of any problem with 'svmForm.txt', run the above script.
python useLinks.py
Generates graph structure of the cora dataset. To be used in future work. Not required right now.
(ii) MATLAB
classifyRocchio
This script implements rocchio text classifier using cosine similarity.
classifySVM
This script uses libsvm and generates classification report for the following kernels:
- RBF Kernel(one-vs-one)
- RBF Kernel(one-vs-rest)
- Polynomial Kernel(one-vs-one)
- Sigmoid Kernel(one-vs-one)
- Linear SVM(LibLinear required)
If you have any trouble, please drop an email to vom2102@columbia.edu