To reproduce results:
Prerequisites:
- python 2.7+,
- pypy 2.2.1,
- pandas 0.14.1
Scripts: MakeDatasets.py
, Online.py
- Download the 3 original datafiles (
TrainingData.csv
,TestData.csv
,andSubmissionFormat.csv)
to the folderorigdata/
. - Run
python MakeDatasets.py
- this command produces the training and test filestrainPredictors.csv
,trainLabels.csv
, andTestData2.csv
. - Run
pypy Online.py 4 0.5
- note that PyPy is required. This command fits an online logistic regression model, taking 4 passes/epochs over the training data with a 50% chance of using an encountered example in each epoch. (The effect of playing with epochs anduse_example_probability
is small for reasonable values - this is just an example configuration but the one used for the winning submission). The submission will then be placed in the filesubmission1234.csv
. Score should be somewhere around 0.3665 (public leaderboard).
For more details see: BoxPlots_First_Place_Model.ipynb