/Python-Mistakes

Detecting Python Syntax Errors with Machine Learning Research Project

Primary LanguagePython

Python-Mistakes INSTRUCTIONS

TO COLLECT NEW ENTRIES:

1.Make sure that you have two json files, one to store 'yes' entries and one to store 'no' entries. You can collect entries onto an existing file of entries, or create a new empty file. If you plan to add the data from these files directly to the arff file, make sure the yes file has 'yes' in the filename and the no file has 'no' in the name.

Alt text

2.Run "python DATACOLLECT.py " to collect data for yes and no entries. Make sure you have pymongo installed and that there is a database of github commits from MongoDB downloaded.

Alt text

3.The program will keep track of a save location number so you can start collecting data from the same place you left off. The number will be displayed each time it finds a commit. The program will prompt you for this number when you run it. Please write the save location number down somewhere so you can enter it in the program if you wish to start collecting data from the same location the next time you run the program.

Alt text

TO RECORD ENTRIES IN THE ARFF FILE:

1.Run "python WRITER6.py " to upload the data into the arff file. The program starts with a list of files to upload from, but will prompt you to change this list if you wish.

Alt text

You can also go into the program and replace the list if you are going to be adding from the same files many times.

Alt text

There are also other WRITER programs that add different combination of attributes. WRITER6 is the most recent version of the program. Each writer program has a description at the top of what attributes it will add.

Alt text

2.Please note that WRITER6 will modify all files that are added to the arff file by deleting duplicate entries and modifying the keys. It will also remove any extra values.

RUNNING WEKA:

1.Run the wekadatatest.arff file (or where ever you stored your entries. For Classifiers, use FilteredClassifier and make sure the filter under it is set to StringToWordVector.

2.Under StringToWordVector, make sure the stop list is True (I've been using Rainbow)

3.The classifier you'll use under FilteredClassifier is Vote. Under Vote, set the combinationRule to Majority Voting.

4.Again under vote, click on the classifiers. Delete the ZeroR.

5.One Combination you can try is Bagging(with PART) and Bagging(with RandomForest): Choose Bagging first and then click on the Bagging again to choose PART under rules. Now press the Add button. Do the same to add Random Forest.

Enjoy :)