Cause Effect Pairs Challenge

This repo intiated from a copy of benchmark and sample code in Python for the Cause Effect Pairs Challenge, a machine learning challenged hosted by Kaggle and organized by ChaLearn.

Executing this requires Python 2.7 along with the following packages:

pandas (tested with version 10.1)
sklearn (tested with version 0.13)
numpy (tested with version 1.6.2)
scipy (tested with version 0.10.)
ml_metrics

To run,

Download the data
Create three directories inside the repo directory: data, models, submissions
Extract the kaggle data inside the “data” directory such that this is a valid path: data/CEfinal_train_text/CEfinal_train_pairs.csv
Modify SETTINGS.json to point to the training and validation data on your system, as well as a place to save the trained model and a place to save the submission
Now to train the classifier run "python train.py", it will save the model in models directory
Otherwise, to cross-validate, run "python train.py -c 10" [10 fold cv]
To try with a small subset of data, run "python train.py -n 100" [first 100 rows]
Experiment with different classifiers in get_pipeline() function in train.py
So, "python train.py -n 100 -c 3" means it will take first 100 rows and run a 3-fold cross-validation
Make predictions on the validation set by running python predict.py [check the path]
Make a submission with the output file in submissions directory

ragib06/cause-effect-pairs

Cause Effect Pairs Challenge