
Primary LanguagePython


D&C: A Divide-and-Conquer, IR-based, Multi-Classifer Approach to Bug Localization

Citing D&C

You can cite D&C using the following bibtex:

  title={D\&c: A divide-and-conquer approach to ir-based bug localization},
  author={Koyuncu, Anil and Bissyand{\'e}, Tegawend{\'e} F and Kim, Dongsun and Liu, Kui and Klein, Jacques and Monperrus, Martin and Traon, Yves Le},
  journal={arXiv preprint arXiv:1902.02703},


In order to run the code, python3 is necessary with the additional libraries such as LightGBM.

Extract the archives allResults.db.7z and simiSmall.h5 2.7z.00X which are containing the prediction probabilities of all the classifies and similarity score of the bug report / source code files pairs used in the study.

Code Structure:


	The training module: Please change the local variable 'xgbResultFolder' which points
	to the simiSmall.h5 database and save the trained models.


	The prediction module: It uses the similarity database (simiSmall.h5) and the trained
	models to make the predictions. The results of the prediction are saved to be used 
	in the evaluation (eval.py) module

	The folders that is containing the prediction probabilities of the D&C classifiers 
	confusionMatrix/ The confusion matrices that describe the performance of classification
	models 	MAP and MRR values computes (files that are finising with MRR.pick)

	allResults.db.7z : The SQL database containing all the prediction probabilities, 
	which is used to combine the classifiers into a single one.


	The evaluation module: It uses the predictions produced by the prediction 
	module (predict.py); merge into a result database (allResults.db, or loads if 
	it is already available) and computes the MAP and MRR values of the D&C with 
	different combination strageties (max,min,prod,mean etc..) and finally 
	compare the results with the state-of-art approaches.

	The models that trained in the training module.

File Look-up:


	The MAP, MRR, Top1, Top5, Top10 results of the state-of-the-art approaches.

	The file that contains the commit ids of the commits and the bug reports ids 
	of the projects that are used in our experiment.

	The filtered bug reports, as described in our paper.

	The MAP, MRR, Top1, Top5, Top10 results of the state-of-the-art approaches 
	for the whole dataset.

	The MAP, MRR, Top1, Top5, Top10 results of the state-of-the-art approaches,
	filtered, as described in the paper.

	The ground truth object, indicating which are the actually buggy 
	bug-report / source code pairs.