Dataset link: https://drive.google.com/file/d/17XXG75zmR3_bDeWKDNeXYNVL8ptAzUo3/view?usp=sharing Fasttext Model link: https://fasttext.cc/docs/en/pretrained-vectors.html (a. download the English wiki.en.bin model b. Under root directory, construct model\wiki.en directory and c. put the wiki.en.bin model into it)
Software prepare
- Neo4j: as the database to save the Bug Tossing Graph
- Gephi: Use the community detection algorithm to get the modularity class of the product::component (The parameters of the community detection algorithm we used are ''Randomize'' is On, ''Use edge weights'' is On and ''Resolution'' is 1.0.)
Directory prepare
- Construct the data directory under the root directory
- Construct the data directory under the scripts directory
Dataset prepare
- Run get_product_component.py to get product_component.json (adjust the filepath according to where you put the product_component_files)
- Run filter_bugs.py to get filtered_bugs.json
- Run split_train_test_dataset.py to get train_bugs.json and test_bugs.json
- Run generate_tossing_graph_goal_oriented_path.py to get Bug Tossing Graph (a. need to connect with Neo4j b. train_bugs only)
- Run get_vec.py to get vector for text information (Note that after step 5, change the ONEHOT_DIM in config.py according to the onehot.dim from onehot = TfidfOnehotVectorizer())
- Run get_graph_feature_for_pc.py for graph features of product components
Feature vector
- Change FEATURE_VECTOR_NUMS_PER_FILE in config.py to (the number of product::component) * 10,000 or FEATURE_VECTOR_NUMS_PER_FILE % (the number of product::component) == 0
- Run get_feature_vector.py to get the relevance label and features about text information
- Run get_graph_feature_vector.py to get bug feature and features about graph
- Run add_feature_vector_graph.py to merge features from step2 and step3
Model
- Run train_lambdaMart.py to train the learning to rank model
Result
- Run test_lambdaMart.py to test the model (change PRODUCT_COMPONENT_PAIR_NUM in config.py to the number of product::component)
- Run change_result_format.py to change result.csv (got from test_lambdaMart.py) into a more readable format (metrics.json))
- Run calculate_accuracy_ndcg.py to calculate accuracy and ndcg
- Run get_mrr.py to calculate mrr
- Run split_test_dataset_into_tossed_and_untossed.py to split test_bugs.json into tossed_test_bugs.json and untossed_test_bugs.json
- Run split_test_feature_vector_into_tossed_untossed.py to split test_feature_vector into tossed and untossed
- Reuse Result (Step 1-4) to calculate the result (accuracy, ndcg, mrr) on tossed and untossed bugs (Note that adjust "test_bugs_type" in Result (Step 1-4) python files to choose which kind of test bugs for testing)
Note that LR-BKG needs amount of memory and disk storage!!!