Cause Effect Pairs Challenge FirfiD Submission
Pre-requisites: You need the following installed: python 2.7.1 python sklearn version 0.13.1 python numpy version 1.7.1 python joblib python pandas version >=0.11
Matlab Preferably Debian Based Linux Installation
Kaggle Causality Challenge framework. Mostly based on kaggle's python code code for the challenge.
To train: A. Configure
- Put your training data in the following files (or modify file names accordingly):
"train_pairs_path": "./Competition/CEdata_final_train_pairs.csv" "train_info_path": "./Competition/CEdata_final_train_publicinfo.csv" "train_target_path": "./Competition/CEdata_final_train_target.csv"
B. Extracting Features
- Modify SETTINGS.json "feature_extraction_threads" to the number of threads your machine can handle.
- Run "python fe.py"
- Add Matlab features by running "./extract_matlab_valid.sh"
- Merge the futures by running "python process_matlab.py -t valid"
C. Train:
- Run "python train.py"
To predict:
A. Clean-up
- Replace ./Competition/CEfinal_valid*.csv with the respective files you are interested in extracting features from. By default this is set to a minimal subset of valid features.
- Run "./clean.sh"
B. Extracting Features
- Modify SETTINGS.json "feature_extraction_threads" to the number of threads your machine can handle.
- Run "python fe.py"
- Add Matlab features by running "./extract_matlab_valid.sh"
- Merge the futures by running "python process_matlab.py -t valid"
C. Generating results
- Run "python predict.py". The results file should be "./Submisions/firfi-tree-trees.csv".