Parallel Support Vector Regression

Final Year Dissertation by Akshay Viswanathan at National University of Singapore.

Support Vector Regression is a variation of Support Vector Machines which produces a regression model from a subset of the input training data. Unfortunately, usage of SVRs for large data is limited by the cubic time and quadratic memory cost (in the number of training instances) incurred in the training process.

To improve scalability and allow usage with real world data, a Parallel SVR algorithm has been proposed which scales with increased number of processor cores and utilizes low rank matrix approximations.

The project is developed to be run on a Linux platform. Please ensure that either MPI2 or MPI3 has been installed before proceeding.

Running the project

Preprocessing:

Please ensure that both training and testing files have the following format:
y_i 1: x_i,1 2: x_i,2 .... n: x_i,n

move to /trunk directory
compile the project by running the command: make
After the project has been made, you can run the training phase by running: mpirun -n {number of cores} ./svr_train [options] {training_file}

The training by default uses the Gaussian Kernel, to use another, refer to the command line parameters

Prediction can be run by: mpirun -n {number of cores} ./svr_predict [options] {prediction_file}.

Please note that the number of cores for prediction and training must be the same
The prediction results are stored in a file named 'PredictResult' by default.

For example:

mpirun -n 2 ./svr_train datasets/pyrim
mpirun -n 2 ./svr_predict datasets/pyrim

grapesonwang/psvr

Parallel Support Vector Regression

Running the project