/InstructCode

Improving Code Quality through Fine-Tuning Code Generation Models using Reinforcement Learning

Primary LanguageJupyter Notebook

InstructCode

Improving Code Quality through Fine-Tuning Code Generation Models using Reinforcement Learning

This works explores a method to steer a code generation model to produce Python code with higher code quality and and less bugs. For this, we make use of a novel technique to fine-tune the code generation model CodeParrot using Reinforcement Learning.

How it works

The fine-tuning works as follows: The model receives a coding question in natural language, similar to the kind one might find on a Stack Overflow post, and is tasked to produce a code snippet that solves this problem. To reward high quality and correct code snippets, we use a BERT classifier to analyze the quality of the produced code snippet, and use the classifier outputs as reward signals for the reinforcement learning training which is performed using the PPO algorithm.

Sketch of fine-tuning workflow

Project Setup

  • The report of the project can be found as the file summary.pdf in this folder.

  • The trainings code for the model that will be referred to as the "Reward Model" in the report can be found in this folder as a notebook file called fine-tune-bert-code-quality-classifcation-model.ipynb

  • The trainings code for fine-tuning the CodeParrot code generation model using reinforcement learning can be found in this folder as a notebook file called fine-tune-code-generation-model-using-reinforcement-learning.ipynb

For running both notebook files, I recommend loading them into Google Colab and activating GPU support.

Acknowledgements

A huge thank you to Leandro von Werra for creating the two libraries/models this project is building upon (The trl library + the CodeParrot models). I really appreciate your open-source work!