Thesis

In the realm of software development and maintenance, the manual linking of software feature documents to code com- ponents is a critical yet challenging task for developers. This linkage is essential for various purposes such as implement- ing new features, documentation, track- ing, and test case design. Additionally, it plays a pivotal role in third-party in- spections to ensure compliance with regu- lations governing different software prod- ucts. However, the manual linking pro- cess is fraught with challenges, including errors and time constraints. To address these challenges, previous studies have pro- posed automated techniques. Despite their potential benefits, these techniques often encounter issues related to accuracy, cost, and explainability. In response to these limitations, our work is dedicated to ad- dressing three crucial dimensions: accu- racy, cost-effectiveness, and overall perfor- mance. Our proposed solution involves an innovative ensemble approach designed to simultaneously improve accuracy and re- duce costs. Through a series of exten- sive experiments conducted on two distinct software projects, we observed a notewor- thy performance enhancement when link- ing software features to three fundamen- tal architectural abstractions of code com- ponents: modules, classes, and methods. Comparing our ENSEMBLE approach to baseline lightweight techniques such as the Vector Space Model (VSM), Latent Seman- tic Indexing (LSI), and A Contextual The- matic Approach for Linking Features to Muilti-level Software Architectural Com- ponent (FSECAM), our approach demon- strated superior results. The evaluation, en- compassing datasets from 18 cases across the two projects, showcased higher preci- sion rates, recall, and F1 scores, particu- larly in the context of Modules and Classes within proprietary projects. Out of the 18 cases evaluated, our proposed ENSEM- BLE approach exhibited superior results in 14 cases, underscoring its effectiveness in enhancing the linking of software feature documents to code components. This re- search contributes valuable insights into the realm of automated techniques for linking software documentation to code, address- ing key challenges, and paving the way for more accurate, cost-effective, and effi- cient practices in software development and maintenance.