Mrjaggu/code_docstring_relevance_checker

A tiny transformer + LGBM based model to find relevance match between code and docstring

Jupyter NotebookMIT

code docstring relevance checker

A tiny sentence transformer + LGBM based model to find relevance match between code and docstring

Dataset:

Python corpus was used for training the model with 70/30 split and test size of 88k which was kept hidden.

Metric:

F1-score was used for evaluation and AUC was monitored with FP/FN to avoid overfitting of the model.

Emedding

Explored different embedding but had to manage trade off between inference time and model performance
We have used sentence-transformer to get better code and docstring representation.

Process

Reason to go with LGBM:

Light weight and less memory consumption
Feature interpretation
Faster training time

🛠 Languages & Tools Used: