/code_docstring_relevance_checker

A tiny transformer + LGBM based model to find relevance match between code and docstring

Primary LanguageJupyter NotebookMIT LicenseMIT

code docstring relevance checker

A tiny sentence transformer + LGBM based model to find relevance match between code and docstring

Dataset:

Python corpus was used for training the model with 70/30 split and test size of 88k which was kept hidden.

Metric:

F1-score was used for evaluation and AUC was monitored with FP/FN to avoid overfitting of the model.

Emedding

  1. Explored different embedding but had to manage trade off between inference time and model performance
  2. We have used sentence-transformer to get better code and docstring representation.

Process

Reason to go with LGBM:

  1. Light weight and less memory consumption
  2. Feature interpretation
  3. Faster training time

🛠 Languages & Tools Used:

Python Git logo Jupyter Jupyter