Final Project ReadMe

Running Part 1

To run the classification algorithms, we need to first fill in the missing values. Please use the code for part 2 (missing.py) on the datasets of part 1 to do this. In order to do this, change the filepath in the reading of CSVs in Part 2 to the path of the datasets of Part 1 and also use the appropriate delimiter. Also, change the name of the output to what you want it to be.

Once, the missing values have been filled in, you can basically just run knn.py which is the classification algorithm. Make sure to use the appropriate filepaths.

Running Part 2

To run the missing value estimation algorithm, you need to run the missing.py file. Make sure the file path on line 4 matches where the dataset file you want to use is on your system. Make sure to also change the delimiter on line 5 to match the delimiter in the dataset file you want to use, if need be. Finally, on line 129, change the file name of the new dataset that has the estimated values inputted, to whatever you like.

Note: The TrainData files w/the updated extension means that their missing values have been filled in by missing.py

kashmafia/ML_DATA

Final Project ReadMe

Running Part 1

Running Part 2