Automated Code Error Correction using Machine Learning

This project focuses on automatically correcting errors in programming code submissions using machine learning techniques. The goal is to predict the most likely error types in erroneous C code snippets.

Summary

Built a multiclass classification model using Multinomial Logistic Regression to predict 50 different error types in C code snippets
Extracted bag-of-words features from raw code tokens and handled class imbalance in the training data
Developed a Python program to load sparse matrices, train the Logistic Regression model, and make predictions on new unseen code snippets
Tuned hyperparameters like max_iter and solver to optimize model performance
Achieved 87.4 % accuracy in predicting error types on test data

Implementation

The main Python scripts are:

data.py: Functions to load the sparse matrix data
model.py: Logistic Regression model training and prediction
predict.py: Driver script to load data, train model, and run predictions

The trained Logistic Regression model is serialized in model.pkl.

The main libraries used are NumPy, SciPy, and scikit-learn.

Usage

To train the model:

python predict.py --train

To run predictions on new data:

python predict.py --predict NEW_DATA

References

Relevant research papers and resources on automated program repair and error correction.
Purushottam Kar (IITK)

susovanpatra00/Automated-Code-Error-Correction-using-Machine-Learning

Automated Code Error Correction using Machine Learning

Summary

Implementation

Usage

References